12 September 2024

Learning to reason with LLMs

Lagi dimuat…

OpenAI o1 manggon ing persentil kaping 89 ing pitakon pemrograman kompetitif (Codeforces), kalebu ing antarane 500 siswa paling dhuwur ing AS ing babak kualifikasi kanggo USA Math Olympiad (AIME), lan ngluwihi akurasi tingkat PhD manungsa ing benchmark masalah fisika, biologi, lan kimia (GPQA). Nalika karya sing dibutuhake kanggo nggawe model anyar iki semudah digunakake kaya model saiki isih lumaku, kita ngeculake versi awal model iki, OpenAI o1‑preview, supaya bisa langsung digunakake ing ChatGPT lan kanggo pangguna API sing dipercaya⁠(mbukak ing jendhela anyar).

Algoritma sinau penguatan skala gedhe kita mulang model carane mikir kanthi produktif nggunakake rantai pamikiran ing proses latihan sing efisien banget saka sisi data. Kita nemokake manawa kinerja o1 saya apik kanthi konsisten nalika sinau penguatan luwih akeh (compute wektu-latih) lan kanthi wektu mikir sing luwih suwe (compute wektu-uji). Watesan kanggo ngeskalakake pendekatan iki beda banget karo watesan pretraining LLM, lan kita terus nliti perkara kasebut.

The image shows two scatter plots comparing "o1 AIME accuracy" during training and at test time. Both charts have "pass@1 accuracy" on the y-axis and compute (log scale) on the x-axis. The dots indicate increasing accuracy with more compute time.

o1 performance smoothly improves with both train-time and test-time compute

Evaluasi

Kanggo nyorot peningkatan nalar dibandhingake GPT‑4o, kita nguji model-model kita ing maneka ujian manungsa lan benchmark ML. Kita nuduhake manawa o1 kanthi signifikan ngluwihi GPT‑4o ing mayoritas gedhe tugas-tugas sing abot ing nalar iki. Kejaba yen kasebut liya, kita ngevaluasi o1 ing setelan compute wektu-uji maksimal.

Evaluasi kompetisi kanggo Math (AIME 2024), Code (CodeForces), lan Pitakonan Ilmu Pengetahuan tingkat PhD (GPQA Diamond) — o1 nambah kinerja kanthi signifikan tinimbang GPT-4o ing benchmark penalaran sing tantangan. Batang padhet nuduhake akurasi pass@1 lan wilayah sing diarsir nuduhake kinerja voting mayoritas (konsensus) kanthi 64 conto.

Rincian akurasi lan skor mentah gpt-4o vs. o1 ing macem-macem eval kompetisi — o1 luwih apik tinimbang GPT-4o ing macem-macem benchmark, kalebu 54/57 subkategori MMLU. Pitu ditampilake kanggo ilustrasi.

Skor benchmark lan ujian ML kanggo gpt4o vs. o1, peningkatan ing sisih ndhuwur, teks alternatif kanggo seluler — o1 luwih apik tinimbang GPT-4o ing macem-macem benchmark, kalebu 54/57 subkategori MMLU. Pitu ditampilake kanggo ilustrasi.

Ing akeh benchmark sing abot ing nalar, o1 nyaingi kinerja para ahli manungsa. Model tercanggih anyar¹ apik banget ing MATH² lan GSM8K nganti benchmark iki ora maneh efektif kanggo mbedakake model. Kita ngevaluasi kinerja matematika ing AIME, ujian sing dirancang kanggo nantang siswa matematika SMA paling pinter ing Amerika. Ing ujian AIME 2024, GPT‑4o rata-rata mung ngrampungake 12% (1,8/15) masalah. o1 rata-rata 74% (11,1/15) nganggo siji sampel saben masalah, 83% (12,5/15) kanthi konsensus antarane 64 sampel, lan 93% (13,9/15) nalika ngurutake maneh 1000 sampel nganggo fungsi penilaian sing disinaoni. Skor 13,9 nempatake model iki ing antarane 500 siswa paling dhuwur sacara nasional lan ngluwihi ambang kanggo USA Mathematical Olympiad.

Kita uga ngevaluasi o1 ing GPQA diamond, benchmark intelijensi angel sing nguji keahlian kimia, fisika, lan biologi. Kanggo mbandhingake model karo manungsa, kita ngrekrut ahli sing nduweni gelar PhD kanggo mangsuli pitakon GPQA-diamond. Kita nemokake manawa o1 ngluwihi kinerja para ahli manungsa kasebut, dadi model pisanan sing nindakake iku ing benchmark iki. Asil iki ora ateges o1 luwih mumpuni tinimbang wong bergelar PhD ing kabeh aspek — mung model iki luwih trampil ngrampungake sawetara masalah sing lumrahe bisa dirampungake wong PhD. Ing sawetara benchmark ML liyane, o1 ningkat ngluwihi state-of-the-art. Kanthi kapabilitas persepsi visine diaktifake, o1 entuk 78,2% ing MMMU, dadi model pisanan sing bisa saingan karo ahli manungsa. Model iki uga ngluwihi GPT‑4o ing 54 saka 57 subkategori MMLU.

Rantai Pamikiran

Kaya dene manungsa bisa mikir suwe sadurunge mangsuli pitakon sing angel, o1 nggunakake rantai pamikiran nalika nyoba ngrampungake masalah. Lumantar sinau penguatan, o1 sinau ngasah rantai pamikirane lan nyaring strategi sing digunakake. Model iki sinau ngenali lan mbenerake kesalahane. Model iki sinau mecah langkah-langkah sing ruwet dadi sing luwih prasaja. Model iki sinau nyoba pendekatan liya nalika pendekatan saiki ora mlaku. Proses iki kanthi dramatis ningkatake kemampuan model kanggo nalar. Kanggo nggambarake lompatan maju iki, ing ngisor iki kita nuduhake rantai pamikiran saka o1‑preview kanggo sawetara masalah sing angel.

GPT-4o

Pangguna

oyfjdnisdr rtqwainr acxz mynzbhhx -> Think step by step

Use the example above to decode:

oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

Let’s break this down step by step based on the example:

1. Example given:
• Input: oyfjdnisdr rtqwainr acxz mynzbhhx
• Output: Think step by step

By examining the words:
• The pattern involves selecting specific letters or transforming them.

2. Now, let’s decode the new phrase:
• Input: oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

Step-by-step decoding process:

Identify possible word divisions.

The first phrase might follow the same structure as the example. It seems like the input text could be broken into groups based on natural separations or patterns.

A possible division could be:

oyekaijzdf
aaptcg
suaokybhai
ouow
aqht
mynznvaatzacdfoulxxz

However, to make a meaningful decoding, I would need more context about the transformations or letter shifting that might be involved.

Could you provide any additional decoding rules or transformations used in this cipher?

OpenAI o1-preview

Pangguna

oyfjdnisdr rtqwainr acxz mynzbhhx -> Think step by step

Use the example above to decode:

oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

Mikir sajrone 5 detik

First, what is going on here?

We are given:

First, an example:

"oyfjdnisdr rtqwainr acxz mynzbhhx -> Think step by step"

We can see that when "oyfjdnisdr rtqwainr acxz mynzbhhx" is transformed to "Think step by step"

Our task is: Use the example above to decode: "oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz"

So the first part is to figure out how "oyfjdnisdr rtqwainr acxz mynzbhhx" is decoded into "Think step by step"

Observation 1: The phrase "Think step by step" is 18 letters (excluding spaces) . Let's count the letters.

Similarly, let's count the letters in "oyfjdnisdr rtqwainr acxz mynzbhhx"

Original Ciphertext:

oyfjdnisdr rtqwainr acxz mynzbhhx

Plaintext:

Think step by step

First, break down the letters:

Ciphertext letters: o y f j d n i s d r r t q w a i n r a c x z m y n z b h h x

Total letters: Let's count the letters in the ciphertext.

First word: oyfjdnisdr (10 letters)

Second word: rtqwainr (8 letters)

Third word: acxz (4 letters)

Fourth word: mynzbhhx (8 letters)

Total letters in ciphertext: 10 + 8 + 4 + 8 = 30 letters

Plaintext: Think step by step

First word: Think (5 letters)

Second word: step (4 letters)

Third word: by (2 letters)

Fourth word: step (4 letters)

Total letters in plaintext: 5 + 4 + 2 + 4 = 15 letters

Hmm.

But actually in the problem it says the example:

[Ciphertext] -> Think step by step

So is the decoding process that the ciphertext maps to "Think step by step"?

Our goal is to use the example above to decode:

"oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz"

First, the cipher seems connected to the plaintext.

Given the time constraints, perhaps the easiest way is to try to see patterns.

Option 1: Try to find mapping from letters to letters.

Do any letters match?

First, let's write down the ciphertext and plaintext letters on top of each other.

Ciphertext: o y f j d n i s d r r t q w a i n r a c x z m y n z b h h x

Plaintext: T h i n k s t e p b y s t e p

Wait a minute.

I think maybe there is an anagram or substitution cipher here.

Alternatively, I think that we can notice that each group of ciphertext corresponds to a plaintext word.

Check the number of letters.

First ciphertext word: oyfjdnisdr (10 letters)

First plaintext word: Think (5 letters)

Hmm.

Second ciphertext word: rtqwainr (8 letters)

Second plaintext word: step (4 letters)

Third ciphertext word: acxz (4 letters)

Third plaintext word: by (2 letters)

Fourth ciphertext word: mynzbhhx (8 letters)

Fourth plaintext word: step (4 letters)

Interesting.

It seems that the ciphertext words are exactly twice as long as the plaintext words.

(10 vs 5, 8 vs 4, 4 vs 2, 8 vs 4)

Idea: Maybe we need to take every other letter or rebuild the plaintext from the ciphertext accordingly.

Let's test this theory.

If we take every other letter from the ciphertext words, maybe we'll get the plaintext.

Let's try this for the first word:

Ciphertext word: o y f j d n i s d r (10 letters)

Alternate letters:

First letters (positions 1,3,5,7,9): o f d i d

Second letters (positions 2,4,6,8,10): y j n s r

Wait: Let's try taking every second letter or first letter.

Option 1: Take the odd positions:

Positions: 1 3 5 7 9

Letters: o f d i d

Plaintext word is "Think", letters are T h i n k

Hmm.

Perhaps we can map letters:

o f d i d --> Think

Not directly

Option 2: Try mapping as per an assigned code: perhaps columns of letters?

Alternatively, perhaps the cipher is more complex.

Alternatively, notice that "oyfjdnisdr" has 10 letters and "Think" has 5 letters.

What if each plaintext letter is being represented by two ciphertext letters?

Similarly for other words.

Suppose that each plaintext letter corresponds to pair of letters in the ciphertext.

Meaning that the first two letters of ciphertext map to the first plaintext letter, and so on.

So for the first word, partition the ciphertext letters into pairs:

oy fj dn is dr

Corresponding to plaintext letters T h i n k

So we can attempt to map:

oy -> T

fj -> h

dn -> i

is -> n

dr -> k

If this is the mapping, perhaps each pair of letters maps to one plaintext letter.

Similarly for other words.

Let's test this mapping.

Is there a consistent mapping from pairs to letters?

Let's make a table of pairs and corresponding letters.

First, make the pairs from the first word and their corresponding plaintext letters:

Ciphertext Pairs -> Plaintext Letter

oy -> T

fj -> h

dn -> i

is -> n

dr -> k

Now, what's the mapping from "oy" to 'T'?

Is there a way to map 'oy' to 'T'?

Similarly for others.

One possible way is that 'oy' represents 'T'.

But how?

Option 1: Sum of the letters.

Convert letters to numbers (A=1, B=2, ..., Z=26)

So let's try to get to 'T' from 'o' and 'y'

o = 15

y = 25

Sum: 15 +25 = 40

But 'T' is 20.

Alternatively, perhaps subtract: 25 -15 = 10.

No.

Alternatively, perhaps combine the numbers in some way.

Alternatively, think about their positions in the alphabet.

Alternatively, perhaps the letters are encrypted via a code.

Alternatively, perhaps if we overlay the word 'Think' over the cipher pairs 'oy', 'fj', etc., the cipher is formed by substituting each plaintext letter with two letters.

Alternatively, perhaps consider the 'original' letters.

Let's look at the letters between the word "Think" and "oyfjdnisdr"

Perhaps we can spot that "oyfjdnisdr" is an anagram for "Thinkfiles"?

But that seems unlikely.

Alternatively, think of the pairs.

Suppose we assign numbers to letters: a=1, b=2, ..., z=26

Consider the numbers in 'oy':

o = 15

y = 25

Add them: 15 + 25 = 40

Now, for 'T' we have:

'T' is number 20

What if we take the sum of the cipher pair mod 26.

15 +25 = 40

Then 40 mod 26 = 14

14 corresponds to 'N'

Doesn't match.

Alternatively, see if the cipher pairs correspond to something else.

Alternatively, perhaps the pair 'oy' corresponds to 'T' via key.

Alternatively, perhaps it's as simple as 'oy' is 'T'.

But 'o' corresponds to 'T', subtracting 'o' ASCII code from 'T'.

Now, let's think in terms of symmetric letters.

Alternatively, perhaps we can spot that the letters correspond to the first letters of words.

But perhaps an easier way is to look at the word positions.

Alternatively, perhaps we can see that the ciphertext letters are offsets from the plaintext letters.

So let's try to map the pairs to the plaintext letters.

First pair 'oy' and plaintext letter 'T'

Letter 'o' is 15, 'T' is 20

Letter 'o' + 5 = 'T' (15 +5=20)

But 'y' is 25, which is 5 letters after 'T' (20 +5=25)

Interesting, seems that 'o' +5 = 'T'

Similarly, 'y' +1 = 'Z', which is not 'T'.

Alternatively, maybe the average of the two letters corresponds to 'T'

(15 +25)/2 =20, which is 'T'

Wait a minute, that seems promising.

First pair: 'o' (15) + 'y' (25) = 40

40 /2 =20

20 corresponds to 'T'

So perhaps the average of the letters in the pair corresponds to the plaintext letter.

Let's check this with the second pair.

Second pair: 'fj' corresponding to 'h'

Letters 'f'=6, 'j'=10

Sum: 6+10=16

Average:16/2=8

8 corresponds to 'h' (since 'h' is 8)

Success!

Third pair: 'dn' to 'i'

'd'=4, 'n'=14

Sum:4+14=18

Average:18/2=9

9 corresponds to 'i'(9='i')

But 'i' is 9, so that seems off by 1.

So perhaps we need to think carefully about letters.

Wait, 18/2=9, 9 corresponds to 'I'

So this works.

Fourth pair: 'is' corresponding to 'n'

'i'=9, 's'=19

Sum:9+19=28

Average:28/2=14

14 corresponds to 'n'(14='N')

Yes!

Fifth pair: 'dr' corresponds to 'k'

'd'=4, 'r'=18

Sum:4+18=22

Average:22/2=11

11 corresponds to 'k'(11='K')

Perfect!

So our code is: For each pair, sum their numeric values, divide by 2, get the corresponding letter.

Alternatively, the average of the numeric values of the letters corresponds to the plaintext letter.

This seems to be a valid method.

Now let's test this with the other words.

Second word ciphertext: rtqwainr

Ciphertext pairs:

rt -> s

qw -> ?

ai -> ?

nr -> ?

Corresponding to plaintext letters 's', 't', 'e', 'p'

So let's take the pairs:

1. 'rt' corresponds to 's'

r=18, t=20

Sum:38

Average:19

19 corresponds to 's' (19='S')

Good

2. 'qw' corresponds to 't'

q=17, w=23

Sum:17+23=40

Average:20

20 corresponds to 't' (20='T')

Good

3. 'ai' corresponds to 'e'

a=1, i=9

Sum:10

Average:5

5 corresponds to 'e' (5='E')

Good

4. 'nr' corresponds to 'p'

n=14, r=18

Sum:32

Average:16

16 corresponds to 'p' (16='P')

Perfect!

Third word: acxz (4 letters)

Plaintext word: 'by' (2 letters)

Ciphertext pairs:

a c

x z

Plaintext letters: 'b', 'y'

First pair: 'a', 'c'

a=1, c=3

Sum:4

Average:2

2 corresponds to 'b'(2='B')

Second pair: 'x' and 'z'

x=24, z=26

Sum:50

Average:25

25 corresponds to 'y'(25='Y')

Yes!

Fourth word: mynzbhhx (8 letters)

Plaintext word: 'step' (4 letters)

Ciphertext pairs:

my

nz

bh

hx

Plaintext letters 's','t','e','p'

First pair 'm','y'

m=13, y=25

Sum:38

Average:19

19 corresponds to 's'(19='S')

Second pair: 'n','z'

n=14, z=26

Sum: 40

Average:20

20='t'

Third pair: 'b','h'

b=2, h=8

Sum:10

Average:5

5='e'

Fourth pair: 'h','x'

h=8, x=24

Sum:32

Average:16

16='p'

Got it!

Now, given this, we can decode the given message.

Ciphertext to decode:

"oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz"

First, we need to divide into words.

Given that the ciphertext in the example had words of length:

-10 letters (5 pairs)
-8 letters (4 pairs)
-4 letters (2 pairs)
-8 letters (4 pairs)

Transported to plaintext words of:

-5 letters
-4 letters
-2 letters
-4 letters

So perhaps in our supplied ciphertext, we need to divide the letters into words accordingly.

Ciphertext: o y e k a i j z d f a a p t c g s u a o k y b h a i o u o w a q h t m y n z n v a a t z a c d f o u l x x z

First, let's write the letters in order:

1) oyekaijzdf (10 letters) -> word of 5 letters

2) aaptcg (6 letters) -> word of 3 letters

3) suaokybhai (10 letters) -> word of 5 letters

4) ouow (4 letters) -> word of 2 letters

5) aqht (4 letters) -> word of 2 letters

6) mynznvaatzacdfoulxxz (22 letters) -> word of 11 letters

Totals:

So the letters are:

First word: oyekaijzdf (10 letters = 5 pairs)

Second word: aaptcg (6 letters=3 pairs)

Third word: suaokybhai (10 letters=5 pairs)

Fourth word: ouow (4 letters=2 pairs)

Fifth word: aqht (4 letters=2 pairs)

Sixth word: mynznvaatzacdfoulxxz (22 letters=11 pairs)

So we can suppose that the plaintext words have lengths:

1) 5 letters

2) 3 letters

3) 5 letters

4) 2 letters

5) 2 letters

6)11 letters

Now we can proceed to decode.

First word: oyekaijzdf

Divide into pairs:

oy

ek

ai

jz

df

So we can find the plaintext letters by averaging the numeric values of the letters in each pair.

First pair: 'o','y'

o=15, y=25

Sum=40

Average=20

20='T'

Second pair: 'e','k'

e=5, k=11

Sum=16

Average=8

8='H'

Third pair: 'a','i'

a=1, i=9

Sum=10

Average=5

5='E'

Fourth pair: 'j','z'

j=10, z=26

Sum=36

Average=18

18='R'

Fifth pair: 'd','f'

d=4, f=6

Sum=10

Average=5

5='E'

So the first word is 'T','H','E','R','E'

So 'THERE'

Second word: aaptcg

Pairs:

aa

pt

cg

First pair: 'a','a'

a=1, a=1

Sum=2

Average=1

1='A'

Second pair: 'p','t'

p=16, t=20

Sum=36

Average=18

18='R'

Third pair: 'c','g'

c=3, g=7

Sum=10

Average=5

5='E'

So second word: 'A','R','E'

Third word: suaokybhai

Pairs:

su

ao

ky

bh

ai

First pair: s=19, u=21

Sum=40, Average=20

20='T'

Second pair: a=1, o=15

Sum=16, Average=8

8='H'

Third pair: k=11, y=25

Sum=36, Average=18

18='R'

Fourth pair: b=2, h=8

Sum=10, Average=5

5='E'

Fifth pair: a=1, i=9

Sum=10, Average=5

5='E'

Third word: 'T','H','R','E','E'

Fourth word: ouow

Pairs: ou, ow

First pair: o=15, u=21

Sum=36, Average=18

18='R'

Second pair: o=15, w=23

Sum=38, Average=19

19='S'

Fourth word: 'R','S'

Fifth word: aqht

Pairs: a q, h t

First pair: a=1, q=17

Sum=18, Average=9

9='I'

Second pair: h=8, t=20

Sum=28, Average=14

14='N'

Fifth word: 'I','N'

Sixth word: mynznvaatzacdfoulxxz (22 letters: 11 pairs)

Pairs:

m y

n z

n v

a a

t z

a c

d f

o u

l x

x z

Let's list them properly.

Wait, earlier I missed some letters there.

Let's re-express the sixth word letters:

m y n z n v a a t z a c d f o u l x x z

Total letters: 22 letters

Pairs:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z

Wait, that's 21 letters.

But in total, we have 22 letters; I think I have a mismatch.

Wait, in the problem the sixth word is:

mynznvaatzacdfoulxxz

Let's separate the letters:

m y n z n v a a t z a c d f o u l x x z

Total letters: 22 letters

So pairs would be:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z

Wait, 'z' is only one letter at the end.

So maybe the letters are:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z (left alone?)

But then we're missing a letter.

Alternatively, perhaps the 'z' at the end is an extra letter or it's paired with the previous 'x'.

Alternatively, perhaps there is a mistake.

Alternatively, perhaps when we reach the last 'x x z', we should write:

9. l x

10. x z

Total pairs: 11

So updated pairs:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x z

Now total pairs: 10 pairs. But we are supposed to have 11 letters in the plaintext word.

So perhaps I have miscounted and not considered including 'f' or 'z'.

Wait, is there any chance that the sequence is:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z

We have 22 letters.

Perhaps the 'x x' and 'z' are overlapping.

But we need 11 pairs.

I think for this to work, we need to create 11 pairs from the 22 letters.

So let's try pairing letters in order, taking two letters at a time.

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z ?

But again, last pair only has one letter 'z'

Alternatively, we might need to adjust the pairs.

Perhaps it's 'ou', 'lx', 'xz', and there is no extra 'x'.

Maybe I miswrote.

Wait, let's write the letters as they come:

m y n z n v a a t z a c d f o u l x x z

Positions:

1 m

2 y

3 n

4 z

5 n

6 v

7 a

8 a

9 t

10 z

11 a

12 c

13 d

14 f

15 o

16 u

17 l

18 x

19 x

20 z

So 20 letters.

Now that makes sense.

Now we can divide into pairs:

Pairs:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x z

Total of 10 pairs, which will correspond to 10 letters.

But earlier we thought the word was 11 letters.

But 10 letters may be acceptable.

Maybe we should proceed with the 10 pairs.

First pair: m y

m=13, y=25

Sum=38

Average=19

19='S'

Second pair: n z

n=14, z=26

Sum=40

Average=20

20='T'

Third pair: n v

n=14, v=22

Sum=36

Average=18

18='R'

Fourth pair: a a

a=1, a=1

Sum=2

Average=1

1='A'

Fifth pair: t z

t=20, z=26

Sum=46

Average=23

23='W'

Sixth pair: a c

a=1, c=3

Sum=4

Average=2

2='B'

Seventh pair: d f

d=4, f=6

Sum=10

Average=5

5='E'

Eighth pair: o u

o=15, u=21

Sum=36

Average=18

18='R'

Ninth pair: l x

l=12, x=24

Sum=36

Average=18

18='R'

Tenth pair: x z

x=24, z=26

Sum=50

Average=25

25='Y'

So the tenth word is 'S', 'T', 'R', 'A', 'W', 'B', 'E', 'R', 'R', 'Y'

Wait that's 10 letters

So the sixth word is 'STRAWBERRY', which makes sense.

So putting all these translated letters together:

First word: 'THERE'

Second word: 'ARE'

Third word: 'THREE'

Fourth word: 'RS' (? Doesn't make sense)

Wait, the fourth word, the pairs were:

ou ow

Pairs:

First pair: o u

o=15, u=21

Sum=36, Average=18

18='R'

Second pair: o w

o=15, w=23

Sum=38, Average=19

19='S'

So we get 'R','S'

Similarly, fifth word 'IN'

'R', 'S' 'I','N'

So the message so far is

'THERE ARE THREE RS IN'

And then 'STRAWBERRY'

Wait, let's put the words together:

'THERE ARE THREE Rs IN STRAWBERRY'

I think 'RS' is 'R's', meaning 'Rs'

So the message is:

'THERE ARE THREE R'S IN STRAWBERRY'

Which is true.

So the decoded message is:

"THERE ARE THREE R'S IN STRAWBERRY"

To decode the given ciphertext using the example provided, we’ll use the pattern identified in the example.

Ciphertext Example:

Teks Polos

1oyfjdnisdr rtqwainr acxz mynzbhhx

Decoded as:

Teks Polos

1Think step by step

Decoding Method:

Pair the letters in the ciphertext.
Convert each letter to its numerical position in the alphabet (A=1, B=2, …, Z=26).
Sum the numerical values of each pair.
Compute the average of the sum (divide by 2).
Convert the average back to a letter (1=A, 2=B, …, 26=Z).

Apply this method to the given ciphertext:

Teks Polos

1oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

Break down the ciphertext into pairs:

1. First word: “oyekaijzdf”
1. Pairs: oy, ek, ai, jz, df
2. Decoded letters:
  1. oy → (15+25)/2 = 20 → T
  2. ek → (5+11)/2 = 8 → H
  3. ai → (1+9)/2 = 5 → E
  4. jz → (10+26)/2 = 18 → R
3. df → (4+6)/2 = 5 → E
4. Decoded word: THERE
Second word: “aaptcg”
1. Pairs: aa, pt, cg
2. Decoded letters:
  1. aa → (1+1)/2 = 1 → A
  2. pt → (16+20)/2 = 18 → R
  3. cg → (3+7)/2 = 5 → E
  4. Decoded word: ARE
Third word: “suaokybhai”
1. Pairs: su, ao, ky, bh, ai
2. Decoded letters:
  1. su → (19+21)/2 = 20 → T
  2. ao → (1+15)/2 = 8 → H
  3. ky → (11+25)/2 = 18 → R
  4. bh → (2+8)/2 = 5 → E
  5. ai → (1+9)/2 = 5 → E
3. Decoded word: THREE
Fourth word: “ouow”
1. Pairs: ou, ow
2. Decoded letters:
  1. ou → (15+21)/2 = 18 → R
  2. ow → (15+23)/2 = 19 → S
3. Decoded letters: R’S
Fifth word: “aqht”
1. Pairs: aq, ht
2. Decoded letters:
  1. aq → (1+17)/2 = 9 → I
  2. ht → (8+20)/2 = 14 → N
3. Decoded word: IN
Sixth word: “mynznvaatzacdfoulxxz”
1. Pairs: my, nz, nv, aa, tz, ac, df, ou, lx, xz
2. Decoded letters:
  1. my → (13+25)/2 = 19 → S
  2. nz → (14+26)/2 = 20 → T
  3. nv → (14+22)/2 = 18 → R
  4. aa → (1+1)/2 = 1 → A
  5. tz → (20+26)/2 = 23 → W
  6. ac → (1+3)/2 = 2 → B
  7. df → (4+6)/2 = 5 → E
  8. ou → (15+21)/2 = 18 → R
  9. lx → (12+24)/2 = 18 → R
  10. xz → (24+26)/2 = 25 → Y
3. Decoded word: STRAWBERRY

Final Decoded Message:

Teks Polos

1THERE ARE THREE R'S IN STRAWBERRY

Answer:
THERE ARE THREE R’S IN STRAWBERRY

Coding

Kita nglatih model sing entuk 213 poin lan manggon ing persentil kaping 49 ing International Olympiad in Informatics (IOI) 2024, kanthi miwiti saka o1 lan nglatih supaya luwih ningkatake katrampilan pemrograman. Model iki melu IOI 2024 kanthi kahanan sing padha karo peserta manungsa. Model iki nduweni wektu sepuluh jam kanggo ngrampungake enem masalah algoritmik sing nantang lan diidini 50 pangajuan saben masalah.

Kanggo saben masalah, sistem kita nyampling akeh pangajuan calon lan ngirim 50 ing antarane adhedhasar strategi pamilihan wektu-uji. Pangajuan dipilih adhedhasar kinerja ing kasus uji publik IOI, kasus uji sing digawe model, lan fungsi penilaian sing disinaoni. Yen kita ngirim kanthi acak, rata-rata kita mung bakal entuk 156 poin, nuduhake manawa strategi iki regane meh 60 poin ing watesan kompetisi.

Kanthi watesan pangajuan sing luwih longgar, kita nemokake manawa kinerja model mundhak kanthi signifikan. Nalika diidini 10.000 pangajuan saben masalah, model iki entuk skor 362.14 – ngluwihi ambang medali emas – sanajan tanpa strategi pamilihan wektu-uji apa wae.

Pungkasan, kita nyimulasi kontes pemrograman kompetitif sing dianakake Codeforces kanggo nduduhake katrampilan coding model iki. Evaluasi kita cocog banget karo aturan kompetisi lan ngidini 10 pangajuan. GPT‑4o entuk rating Elo³ 808, sing ana ing persentil kaping 11 saka pesaing manungsa. Model iki ngluwihi adoh GPT‑4o lan o1—entuk rating Elo 1807, kanthi kinerja luwih apik tinimbang 93% pesaing.

The image shows a bar chart comparing Codeforces Elo percentile rankings for different models. GPT-4o has 808 Elo (11th percentile), o1 preview has 1258 Elo (62nd percentile), o1 has 1673 Elo (89th percentile), and o1-ioi has 1807 Elo (93rd percentile).

Further fine-tuning on programming competitions improves o1. The improved model ranked in the 49th percentile in the 2024 International Olympiad in Informatics under competition rules.

Evaluasi preferensi manungsa

Saliyane ujian lan benchmark akademik, kita uga ngevaluasi preferensi manungsa marang o1‑preview vs GPT‑4o ing prompt terbuka sing nantang ing spektrum domain sing amba. Ing evaluasi iki, pelatih manungsa diwenehi respons anonim kanggo siji prompt saka o1‑preview lan GPT‑4o, banjur milih respons sing luwih disenengi. o1‑preview luwih disenengi tinimbang gpt-4o kanthi selisih gedhe ing kategori sing abot ing nalar kaya analisis data, coding, lan matematika. Nanging, o1‑preview ora luwih disenengi ing sawetara tugas basa alami, nuduhake manawa model iki ora cocog kanggo kabeh kasus panggunaan.

Safety

Nalar rantai pamikiran menehi kesempatan anyar kanggo alignment lan safety. Kita nemokake manawa ngintegrasi kebijakan kita kanggo prilaku model menyang rantai pamikiran model nalar minangka cara sing efektif kanggo mulang nilai lan prinsip manungsa kanthi kuwat. Kanthi mulang model aturan safety kita lan cara nalar babagan aturan kasebut ing konteks, kita nemokake bukti manawa kapabilitas nalar langsung migunani kanggo kekokohan model: o1‑preview entuk kinerja sing luwih apik kanthi signifikan ing evaluasi jailbreak utama lan benchmark internal kita sing paling angel kanggo ngevaluasi wates penolakan safety model kita. Kita yakin yen nggunakake rantai pamikiran menehi kemajuan penting kanggo safety lan alignment amarga (1) ngidini kita ngamati model lagi mikir kanthi cara sing bisa diwaca, lan (2) nalar model babagan aturan safety luwih kukuh marang skenario out-of-distribution.

Kanggo nguji ketahanan peningkatan kita, kita nindakake serangkaian tes safety lan red-teaming sadurunge deployment, selaras karo Kerangka Kesiapan⁠(mbukak ing jendhela anyar) kita. Kita nemokake manawa nalar rantai pamikiran nyumbang marang peningkatan kapabilitas ing saindenging evaluasi kita. Sing utamane nyolok, kita ndeleng conto menarik saka reward hacking⁠(mbukak ing jendhela anyar). Asil rinci saka evaluasi iki bisa ditemokake ing kertu sistem sing ngancani.

Metrik	GPT-4o	o1-preview
% Rampung sing aman kanggo pituduh sing mbebayani Standar	0,990	0,995
% Rampung aman kanggo pituduh sing mbebayani Tantangan: jailbreaks & kasus pinggiran	0,714	0,934
↳ Pelecehan (parah)	0,845	0,900
↳ Konten seksual eksploitatif	0,483	0,949
↳ Konten seksual sing nglibatake bocah cilik	0,707	0,931
↳ Saran babagan tumindak salah sing ora nganggo kekerasan	0,688	0,961
↳ Saran babagan tumindak salah sing kasar	0,778	0,963
% Rampung aman kanggo 200 paling ndhuwur kanthi skor Moderation API paling dhuwur saben kategori ing WildChat Zhao, et al. 2024	0,945	0,971
Goodness@0.1 StrongREJECT jailbreak eval Souly et al. 2024	0,220	0,840
Evaluasi jailbreak saka sumber manungsa	0,770	0,960
% Kepatuhan ing kasus pinggiran internal sing ora mbebayani “ora nolak kakehan”	0,910	0,930
% Kepatuhan ing kasus pinggiran sing ora mbebayani ing XSTest “ora kakehan nolak” Röttger, et al. 2023	0,924	0,976

Ndhelikake Rantai Pamikiran

Kita yakin manawa rantai pamikiran sing didhelikake menehi kesempatan unik kanggo ngawasi model. Kanthi asumsi manawa rantai iki setya lan bisa diwaca, rantai pamikiran sing didhelikake ngidini kita “maca pikiran” model lan ngerti proses pamikirane. Contone, ing mangsa ngarep kita bisa uga pengin ngawasi rantai pamikiran kanggo tandha-tandha manipulasi pangguna. Nanging, supaya iki bisa mlaku, model kudu nduweni kebebasan kanggo ngandharake pikirane ing wangun sing ora diowahi, mula kita ora bisa nglatih kepatuhan kebijakan utawa preferensi pangguna menyang rantai pamikiran. Kita uga ora pengin nggawe rantai pamikiran sing durung selaras katon langsung marang pangguna.

Mula, sawise nimbang pirang-pirang faktor kalebu pengalaman pangguna, kauntungan kompetitif, lan opsi kanggo nerusake pengawasan rantai pamikiran, kita mutusake ora nuduhake rantai pamikiran mentah marang pangguna. Kita ngakoni keputusan iki nduweni kekurangan. Kita ngupaya kanggo sebagian nutupi kekurangan iki kanthi mulang model supaya ngasilake maneh gagasan migunani saka rantai pamikiran ing jawaban. Kanggo seri model o1, kita nampilake ringkesan rantai pamikiran sing digawe model.

Kesimpulan

o1 kanthi signifikan majokake state-of-the-art ing nalar AI. Kita ngrancang ngeculake versi model iki sing luwih apik nalika kita terus ngiterasi. Kita ngarepake kapabilitas nalar anyar iki bakal ningkatake kemampuan kita kanggo nyelaraské model karo nilai lan prinsip manungsa. Kita yakin o1 – lan penerus-peneruse – bakal mbukak akeh kasus panggunaan anyar kanggo AI ing sains, coding, matematika, lan bidang sing gegandhengan. Kita bungah supaya pangguna lan pangembang API bisa nemokake carane model iki bisa ningkatake pakaryan saben dinane.

Lampiran A

Set data	Metrik	gpt-4o	o1-preview	o1
Matematika Kompetisi AIME (2024)	cons@64	13,4	56,7	83,3
Matematika Kompetisi AIME (2024)	pass@1	9,3	44,6	74,4
Kode Kompetisi CodeForces	Halo	808	1.258	1.673
Kode Kompetisi CodeForces	Persentil	11,0	62,0	89,0
GPQA Diamond	cons@64	56,1	78,3	78,0
GPQA Diamond	pass@1	50,6	73,3	77,3
Biologi	cons@64	63,2	73,7	68,4
Biologi	pass@1	61,6	65,9	69,2
Kimia	cons@64	43,0	60,2	65,6
Kimia	pass@1	40,2	59,9	64,7
Fisika	cons@64	68,6	89,5	94,2
Fisika	pass@1	59,5	89,4	92,8
MATH	pass@1	60,3	85,5	94,8
MMLU	pass@1	88,0	92,3	90,8
MMMU (val)	pass@1	69,1	ora ana	78,2
MathVista (testmini)	pass@1	63,8	ora ana	73,9

Panulis

OpenAI

Deleng kontributor

Sitasi

1
https://www.anthropic.com/news/claude-3-5-sonnet⁠(mbukak ing jendhela anyar), https://deepmind.google/technologies/gemini/pro⁠(mbukak ing jendhela anyar)
2
Evaluasi kita nggunakake pamisahan tes 500 masalah sing padha kaya ing https://arxiv.org/abs/2305.20050⁠(mbukak ing jendhela anyar)
3
https://codeforces.com/blog/entry/68288⁠(mbukak ing jendhela anyar)