12 September 2024

Belajar untuk menalar dengan LLM

Kami memperkenalkan OpenAI o1, model bahasa besar baru yang dilatih dengan pembelajaran penguatan untuk melakukan penalaran rumit. o1 berpikir sebelum menjawab—model ini dapat menghasilkan rantai pemikiran internal yang panjang sebelum menanggapi pengguna.

Kontribusi Gunakan o1

Memuat…

OpenAI o1 menempati peringkat ke-89 dalam persentil pada soal-soal pemrograman kompetitif (Codeforces), termasuk di antara 500 siswa terbaik di AS dalam kualifikasi Olimpiade Matematika AS (AIME), dan melampaui akurasi setingkat PhD manusia pada tolok ukur masalah fisika, biologi, dan kimia (GPQA). Meskipun upaya untuk membuat model baru ini sesederhana seperti model-model saat ini masih terus berlanjut, kami merilis versi awal model ini, OpenAI o1‑preview, agar dapat segera digunakan di ChatGPT dan oleh pengguna API tepercaya⁠(terbuka di jendela baru).

Algoritma pembelajaran penguatan berskala besar kami mengajarkan model untuk berpikir secara produktif menggunakan rantai pemikirannya dalam proses pelatihan yang sangat efisien dalam penggunaan data. Kami telah menemukan bahwa kinerja o1 secara konsisten meningkat dengan lebih banyak pembelajaran penguatan (komputasi waktu pelatihan) dan dengan lebih banyak waktu yang digunakan untuk berpikir (komputasi waktu tes). Kendala untuk meningkatkan skala pendekatan ini berbeda secara substansial dari prapelatihan LLM, dan kami terus menyelidikinya.

Gambar menunjukkan dua diagram sebaran yang membandingkan "keakuratan o1 AIME" selama pelatihan dan pada waktu tes. Kedua grafik memiliki "keakuratan pass@1" pada sumbu y dan komputasi (skala log) pada sumbu x. Titik-titik menunjukkan peningkatan keakuratan seiring waktu komputasi yang lebih lama.

Kinerja o1 meningkat secara lancar dengan komputasi waktu pelatihan dan waktu tes

Evaluasi

Untuk menyoroti peningkatan penalaran atas GPT‑4o, kami menguji model kami pada rangkaian ujian manusia dan tolok ukur ML yang beragam. Kami menunjukkan bahwa o1 secara signifikan mengungguli GPT‑4o pada sebagian besar tugas yang membutuhkan penalaran berat ini. Kecuali jika dinyatakan lain, kami mengevaluasi o1 pada pengaturan komputasi waktu tes maksimal.

Evaluasi kompetisi untuk Matematika (AIME 2024), Kode (CodeForces), dan Pertanyaan Ilmu Tingkat PhD (GPQA Diamond) — o1 jauh lebih baik dari GPT-4o pada tolok ukur penalaran yang menantang. Batang padat menunjukkan keakuratan pass@1 dan wilayah yang diarsir menunjukkan kinerja suara mayoritas (konsensus) dengan 64 sampel.

Perincian keakuratan dan skor mentah gpt-4o vs. o1 pada berbagai evaluasi kompetisi — o1 lebih baik dari GPT-4o pada berbagai tolok ukur, termasuk subkategori 54/57 MMLU. Tujuh ditampilkan sebagai ilustrasi.

Tolok Ukur ML dan skor ujian untuk gpt4o versus peningkatan o1 di atas, teks alt untuk seluler — o1 lebih baik dari GPT-4o pada berbagai tolok ukur, termasuk subkategori 54/57 MMLU. Tujuh ditampilkan sebagai ilustrasi.

Dalam banyak tolok ukur yang membutuhkan penalaran berat, o1 menyaingi kinerja pakar manusia. Model terdepan dan terkini¹ bekerja sangat baik pada MATH² dan GSM8K sehingga tolok ukur tersebut tidak lagi berlaku dalam membedakan model. Kami mengevaluasi kinerja matematika pada AIME, ujian yang dirancang untuk menantang siswa matematika sekolah menengah atas paling cerdas di Amerika. Pada ujian AIME 2024, GPT‑4o hanya menyelesaikan rata-rata 12% (1,8/15) soal. o1 memperoleh rata-rata 74% (11,1/15) dengan satu contoh per soal, 83% (12,5/15) dengan konsensus di antara 64 contoh, dan 93% (13,9/15) saat melakukan pemeringkatan ulang pada 1000 contoh dengan fungsi pemberian skor yang dipelajari. Skor 13,9 menempatkannya di antara 500 siswa terbaik secara nasional dan di atas batas Olimpiade Matematika AS.

Kami juga mengevaluasi o1 pada GPQA diamond, tolok ukur kecerdasan sulit yang menguji keahlian dalam bidang kimia, fisika, dan biologi. Untuk membandingkan model dengan manusia, kami merekrut pakar bergelar PhD untuk menjawab pertanyaan GPQA-diamond. Kami menemukan bahwa o1 melampaui kinerja pakar manusia dan menjadi model pertama yang melakukannya pada tolok ukur ini. Hasil-hasil ini tidak menyiratkan bahwa o1 lebih mampu dari pakar bergelar PhD dalam segala hal — hanya saja model itu lebih mahir dalam memecahkan beberapa masalah yang diharapkan dapat dipecahkan oleh pakar bergelar PhD. Pada beberapa tolok ukur ML lainnya, o1 meningkat melampaui yang ada saat ini. Dengan kemampuan persepsi visual yang diaktifkan, o1 memperoleh skor 78,2% pada MMMU, sehingga menjadikannya model pertama yang kompetitif dengan pakar manusia. Model ini juga mengungguli GPT‑4o pada 54 dari 57 subkategori MMLU.

Rantai Pemikiran

Serupa dengan cara manusia berpikir lama sebelum menjawab pertanyaan sulit, o1 menggunakan rantai pemikiran saat mencoba memecahkan masalah. Melalui pembelajaran penguatan, o1 belajar mengasah rantai pemikirannya dan menyempurnakan strategi yang digunakannya. Model ini belajar untuk mengenali dan memperbaiki kesalahannya. Model juga belajar untuk memperinci langkah-langkah yang rumit menjadi langkah-langkah yang lebih sederhana. Model belajar untuk mencoba pendekatan berbeda saat pendekatan yang sekarang tidak berhasil. Proses ini secara dramatis meningkatkan kemampuan model untuk bernalar. Untuk menggambarkan lompatan maju ini, kami menampilkan rantai pemikiran dari o1‑preview pada beberapa masalah sulit di bawah.

GPT-4o

Pengguna

oyfjdnisdr rtqwainr acxz mynzbhhx -> Think step by step

Use the example above to decode:

oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

Let’s break this down step by step based on the example:

1. Example given:
• Input: oyfjdnisdr rtqwainr acxz mynzbhhx
• Output: Think step by step

By examining the words:
• The pattern involves selecting specific letters or transforming them.

2. Now, let’s decode the new phrase:
• Input: oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

Step-by-step decoding process:

Identify possible word divisions.

The first phrase might follow the same structure as the example. It seems like the input text could be broken into groups based on natural separations or patterns.

A possible division could be:

oyekaijzdf
aaptcg
suaokybhai
ouow
aqht
mynznvaatzacdfoulxxz

However, to make a meaningful decoding, I would need more context about the transformations or letter shifting that might be involved.

Could you provide any additional decoding rules or transformations used in this cipher?

OpenAI o1-preview

Pengguna

oyfjdnisdr rtqwainr acxz mynzbhhx -> Think step by step

Use the example above to decode:

oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

Selesai berpikir selama 5 detik

First, what is going on here?

We are given:

First, an example:

"oyfjdnisdr rtqwainr acxz mynzbhhx -> Think step by step"

We can see that when "oyfjdnisdr rtqwainr acxz mynzbhhx" is transformed to "Think step by step"

Our task is: Use the example above to decode: "oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz"

So the first part is to figure out how "oyfjdnisdr rtqwainr acxz mynzbhhx" is decoded into "Think step by step"

Observation 1: The phrase "Think step by step" is 18 letters (excluding spaces) . Let's count the letters.

Similarly, let's count the letters in "oyfjdnisdr rtqwainr acxz mynzbhhx"

Original Ciphertext:

oyfjdnisdr rtqwainr acxz mynzbhhx

Plaintext:

Think step by step

First, break down the letters:

Ciphertext letters: o y f j d n i s d r r t q w a i n r a c x z m y n z b h h x

Total letters: Let's count the letters in the ciphertext.

First word: oyfjdnisdr (10 letters)

Second word: rtqwainr (8 letters)

Third word: acxz (4 letters)

Fourth word: mynzbhhx (8 letters)

Total letters in ciphertext: 10 + 8 + 4 + 8 = 30 letters

Plaintext: Think step by step

First word: Think (5 letters)

Second word: step (4 letters)

Third word: by (2 letters)

Fourth word: step (4 letters)

Total letters in plaintext: 5 + 4 + 2 + 4 = 15 letters

Hmm.

But actually in the problem it says the example:

[Ciphertext] -> Think step by step

So is the decoding process that the ciphertext maps to "Think step by step"?

Our goal is to use the example above to decode:

"oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz"

First, the cipher seems connected to the plaintext.

Given the time constraints, perhaps the easiest way is to try to see patterns.

Option 1: Try to find mapping from letters to letters.

Do any letters match?

First, let's write down the ciphertext and plaintext letters on top of each other.

Ciphertext: o y f j d n i s d r r t q w a i n r a c x z m y n z b h h x

Plaintext: T h i n k s t e p b y s t e p

Wait a minute.

I think maybe there is an anagram or substitution cipher here.

Alternatively, I think that we can notice that each group of ciphertext corresponds to a plaintext word.

Check the number of letters.

First ciphertext word: oyfjdnisdr (10 letters)

First plaintext word: Think (5 letters)

Hmm.

Second ciphertext word: rtqwainr (8 letters)

Second plaintext word: step (4 letters)

Third ciphertext word: acxz (4 letters)

Third plaintext word: by (2 letters)

Fourth ciphertext word: mynzbhhx (8 letters)

Fourth plaintext word: step (4 letters)

Interesting.

It seems that the ciphertext words are exactly twice as long as the plaintext words.

(10 vs 5, 8 vs 4, 4 vs 2, 8 vs 4)

Idea: Maybe we need to take every other letter or rebuild the plaintext from the ciphertext accordingly.

Let's test this theory.

If we take every other letter from the ciphertext words, maybe we'll get the plaintext.

Let's try this for the first word:

Ciphertext word: o y f j d n i s d r (10 letters)

Alternate letters:

First letters (positions 1,3,5,7,9): o f d i d

Second letters (positions 2,4,6,8,10): y j n s r

Wait: Let's try taking every second letter or first letter.

Option 1: Take the odd positions:

Positions: 1 3 5 7 9

Letters: o f d i d

Plaintext word is "Think", letters are T h i n k

Hmm.

Perhaps we can map letters:

o f d i d --> Think

Not directly

Option 2: Try mapping as per an assigned code: perhaps columns of letters?

Alternatively, perhaps the cipher is more complex.

Alternatively, notice that "oyfjdnisdr" has 10 letters and "Think" has 5 letters.

What if each plaintext letter is being represented by two ciphertext letters?

Similarly for other words.

Suppose that each plaintext letter corresponds to pair of letters in the ciphertext.

Meaning that the first two letters of ciphertext map to the first plaintext letter, and so on.

So for the first word, partition the ciphertext letters into pairs:

oy fj dn is dr

Corresponding to plaintext letters T h i n k

So we can attempt to map:

oy -> T

fj -> h

dn -> i

is -> n

dr -> k

If this is the mapping, perhaps each pair of letters maps to one plaintext letter.

Similarly for other words.

Let's test this mapping.

Is there a consistent mapping from pairs to letters?

Let's make a table of pairs and corresponding letters.

First, make the pairs from the first word and their corresponding plaintext letters:

Ciphertext Pairs -> Plaintext Letter

oy -> T

fj -> h

dn -> i

is -> n

dr -> k

Now, what's the mapping from "oy" to 'T'?

Is there a way to map 'oy' to 'T'?

Similarly for others.

One possible way is that 'oy' represents 'T'.

But how?

Option 1: Sum of the letters.

Convert letters to numbers (A=1, B=2, ..., Z=26)

So let's try to get to 'T' from 'o' and 'y'

o = 15

y = 25

Sum: 15 +25 = 40

But 'T' is 20.

Alternatively, perhaps subtract: 25 -15 = 10.

No.

Alternatively, perhaps combine the numbers in some way.

Alternatively, think about their positions in the alphabet.

Alternatively, perhaps the letters are encrypted via a code.

Alternatively, perhaps if we overlay the word 'Think' over the cipher pairs 'oy', 'fj', etc., the cipher is formed by substituting each plaintext letter with two letters.

Alternatively, perhaps consider the 'original' letters.

Let's look at the letters between the word "Think" and "oyfjdnisdr"

Perhaps we can spot that "oyfjdnisdr" is an anagram for "Thinkfiles"?

But that seems unlikely.

Alternatively, think of the pairs.

Suppose we assign numbers to letters: a=1, b=2, ..., z=26

Consider the numbers in 'oy':

o = 15

y = 25

Add them: 15 + 25 = 40

Now, for 'T' we have:

'T' is number 20

What if we take the sum of the cipher pair mod 26.

15 +25 = 40

Then 40 mod 26 = 14

14 corresponds to 'N'

Doesn't match.

Alternatively, see if the cipher pairs correspond to something else.

Alternatively, perhaps the pair 'oy' corresponds to 'T' via key.

Alternatively, perhaps it's as simple as 'oy' is 'T'.

But 'o' corresponds to 'T', subtracting 'o' ASCII code from 'T'.

Now, let's think in terms of symmetric letters.

Alternatively, perhaps we can spot that the letters correspond to the first letters of words.

But perhaps an easier way is to look at the word positions.

Alternatively, perhaps we can see that the ciphertext letters are offsets from the plaintext letters.

So let's try to map the pairs to the plaintext letters.

First pair 'oy' and plaintext letter 'T'

Letter 'o' is 15, 'T' is 20

Letter 'o' + 5 = 'T' (15 +5=20)

But 'y' is 25, which is 5 letters after 'T' (20 +5=25)

Interesting, seems that 'o' +5 = 'T'

Similarly, 'y' +1 = 'Z', which is not 'T'.

Alternatively, maybe the average of the two letters corresponds to 'T'

(15 +25)/2 =20, which is 'T'

Wait a minute, that seems promising.

First pair: 'o' (15) + 'y' (25) = 40

40 /2 =20

20 corresponds to 'T'

So perhaps the average of the letters in the pair corresponds to the plaintext letter.

Let's check this with the second pair.

Second pair: 'fj' corresponding to 'h'

Letters 'f'=6, 'j'=10

Sum: 6+10=16

Average:16/2=8

8 corresponds to 'h' (since 'h' is 8)

Success!

Third pair: 'dn' to 'i'

'd'=4, 'n'=14

Sum:4+14=18

Average:18/2=9

9 corresponds to 'i'(9='i')

But 'i' is 9, so that seems off by 1.

So perhaps we need to think carefully about letters.

Wait, 18/2=9, 9 corresponds to 'I'

So this works.

Fourth pair: 'is' corresponding to 'n'

'i'=9, 's'=19

Sum:9+19=28

Average:28/2=14

14 corresponds to 'n'(14='N')

Yes!

Fifth pair: 'dr' corresponds to 'k'

'd'=4, 'r'=18

Sum:4+18=22

Average:22/2=11

11 corresponds to 'k'(11='K')

Perfect!

So our code is: For each pair, sum their numeric values, divide by 2, get the corresponding letter.

Alternatively, the average of the numeric values of the letters corresponds to the plaintext letter.

This seems to be a valid method.

Now let's test this with the other words.

Second word ciphertext: rtqwainr

Ciphertext pairs:

rt -> s

qw -> ?

ai -> ?

nr -> ?

Corresponding to plaintext letters 's', 't', 'e', 'p'

So let's take the pairs:

1. 'rt' corresponds to 's'

r=18, t=20

Sum:38

Average:19

19 corresponds to 's' (19='S')

Good

2. 'qw' corresponds to 't'

q=17, w=23

Sum:17+23=40

Average:20

20 corresponds to 't' (20='T')

Good

3. 'ai' corresponds to 'e'

a=1, i=9

Sum:10

Average:5

5 corresponds to 'e' (5='E')

Good

4. 'nr' corresponds to 'p'

n=14, r=18

Sum:32

Average:16

16 corresponds to 'p' (16='P')

Perfect!

Third word: acxz (4 letters)

Plaintext word: 'by' (2 letters)

Ciphertext pairs:

a c

x z

Plaintext letters: 'b', 'y'

First pair: 'a', 'c'

a=1, c=3

Sum:4

Average:2

2 corresponds to 'b'(2='B')

Second pair: 'x' and 'z'

x=24, z=26

Sum:50

Average:25

25 corresponds to 'y'(25='Y')

Yes!

Fourth word: mynzbhhx (8 letters)

Plaintext word: 'step' (4 letters)

Ciphertext pairs:

my

nz

bh

hx

Plaintext letters 's','t','e','p'

First pair 'm','y'

m=13, y=25

Sum:38

Average:19

19 corresponds to 's'(19='S')

Second pair: 'n','z'

n=14, z=26

Sum: 40

Average:20

20='t'

Third pair: 'b','h'

b=2, h=8

Sum:10

Average:5

5='e'

Fourth pair: 'h','x'

h=8, x=24

Sum:32

Average:16

16='p'

Got it!

Now, given this, we can decode the given message.

Ciphertext to decode:

"oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz"

First, we need to divide into words.

Given that the ciphertext in the example had words of length:

-10 letters (5 pairs)
-8 letters (4 pairs)
-4 letters (2 pairs)
-8 letters (4 pairs)

Transported to plaintext words of:

-5 letters
-4 letters
-2 letters
-4 letters

So perhaps in our supplied ciphertext, we need to divide the letters into words accordingly.

Ciphertext: o y e k a i j z d f a a p t c g s u a o k y b h a i o u o w a q h t m y n z n v a a t z a c d f o u l x x z

First, let's write the letters in order:

1) oyekaijzdf (10 letters) -> word of 5 letters

2) aaptcg (6 letters) -> word of 3 letters

3) suaokybhai (10 letters) -> word of 5 letters

4) ouow (4 letters) -> word of 2 letters

5) aqht (4 letters) -> word of 2 letters

6) mynznvaatzacdfoulxxz (22 letters) -> word of 11 letters

Totals:

So the letters are:

First word: oyekaijzdf (10 letters = 5 pairs)

Second word: aaptcg (6 letters=3 pairs)

Third word: suaokybhai (10 letters=5 pairs)

Fourth word: ouow (4 letters=2 pairs)

Fifth word: aqht (4 letters=2 pairs)

Sixth word: mynznvaatzacdfoulxxz (22 letters=11 pairs)

So we can suppose that the plaintext words have lengths:

1) 5 letters

2) 3 letters

3) 5 letters

4) 2 letters

5) 2 letters

6)11 letters

Now we can proceed to decode.

First word: oyekaijzdf

Divide into pairs:

oy

ek

ai

jz

df

So we can find the plaintext letters by averaging the numeric values of the letters in each pair.

First pair: 'o','y'

o=15, y=25

Sum=40

Average=20

20='T'

Second pair: 'e','k'

e=5, k=11

Sum=16

Average=8

8='H'

Third pair: 'a','i'

a=1, i=9

Sum=10

Average=5

5='E'

Fourth pair: 'j','z'

j=10, z=26

Sum=36

Average=18

18='R'

Fifth pair: 'd','f'

d=4, f=6

Sum=10

Average=5

5='E'

So the first word is 'T','H','E','R','E'

So 'THERE'

Second word: aaptcg

Pairs:

aa

pt

cg

First pair: 'a','a'

a=1, a=1

Sum=2

Average=1

1='A'

Second pair: 'p','t'

p=16, t=20

Sum=36

Average=18

18='R'

Third pair: 'c','g'

c=3, g=7

Sum=10

Average=5

5='E'

So second word: 'A','R','E'

Third word: suaokybhai

Pairs:

su

ao

ky

bh

ai

First pair: s=19, u=21

Sum=40, Average=20

20='T'

Second pair: a=1, o=15

Sum=16, Average=8

8='H'

Third pair: k=11, y=25

Sum=36, Average=18

18='R'

Fourth pair: b=2, h=8

Sum=10, Average=5

5='E'

Fifth pair: a=1, i=9

Sum=10, Average=5

5='E'

Third word: 'T','H','R','E','E'

Fourth word: ouow

Pairs: ou, ow

First pair: o=15, u=21

Sum=36, Average=18

18='R'

Second pair: o=15, w=23

Sum=38, Average=19

19='S'

Fourth word: 'R','S'

Fifth word: aqht

Pairs: a q, h t

First pair: a=1, q=17

Sum=18, Average=9

9='I'

Second pair: h=8, t=20

Sum=28, Average=14

14='N'

Fifth word: 'I','N'

Sixth word: mynznvaatzacdfoulxxz (22 letters: 11 pairs)

Pairs:

m y

n z

n v

a a

t z

a c

d f

o u

l x

x z

Let's list them properly.

Wait, earlier I missed some letters there.

Let's re-express the sixth word letters:

m y n z n v a a t z a c d f o u l x x z

Total letters: 22 letters

Pairs:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z

Wait, that's 21 letters.

But in total, we have 22 letters; I think I have a mismatch.

Wait, in the problem the sixth word is:

mynznvaatzacdfoulxxz

Let's separate the letters:

m y n z n v a a t z a c d f o u l x x z

Total letters: 22 letters

So pairs would be:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z

Wait, 'z' is only one letter at the end.

So maybe the letters are:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z (left alone?)

But then we're missing a letter.

Alternatively, perhaps the 'z' at the end is an extra letter or it's paired with the previous 'x'.

Alternatively, perhaps there is a mistake.

Alternatively, perhaps when we reach the last 'x x z', we should write:

9. l x

10. x z

Total pairs: 11

So updated pairs:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x z

Now total pairs: 10 pairs. But we are supposed to have 11 letters in the plaintext word.

So perhaps I have miscounted and not considered including 'f' or 'z'.

Wait, is there any chance that the sequence is:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z

We have 22 letters.

Perhaps the 'x x' and 'z' are overlapping.

But we need 11 pairs.

I think for this to work, we need to create 11 pairs from the 22 letters.

So let's try pairing letters in order, taking two letters at a time.

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z ?

But again, last pair only has one letter 'z'

Alternatively, we might need to adjust the pairs.

Perhaps it's 'ou', 'lx', 'xz', and there is no extra 'x'.

Maybe I miswrote.

Wait, let's write the letters as they come:

m y n z n v a a t z a c d f o u l x x z

Positions:

1 m

2 y

3 n

4 z

5 n

6 v

7 a

8 a

9 t

10 z

11 a

12 c

13 d

14 f

15 o

16 u

17 l

18 x

19 x

20 z

So 20 letters.

Now that makes sense.

Now we can divide into pairs:

Pairs:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x z

Total of 10 pairs, which will correspond to 10 letters.

But earlier we thought the word was 11 letters.

But 10 letters may be acceptable.

Maybe we should proceed with the 10 pairs.

First pair: m y

m=13, y=25

Sum=38

Average=19

19='S'

Second pair: n z

n=14, z=26

Sum=40

Average=20

20='T'

Third pair: n v

n=14, v=22

Sum=36

Average=18

18='R'

Fourth pair: a a

a=1, a=1

Sum=2

Average=1

1='A'

Fifth pair: t z

t=20, z=26

Sum=46

Average=23

23='W'

Sixth pair: a c

a=1, c=3

Sum=4

Average=2

2='B'

Seventh pair: d f

d=4, f=6

Sum=10

Average=5

5='E'

Eighth pair: o u

o=15, u=21

Sum=36

Average=18

18='R'

Ninth pair: l x

l=12, x=24

Sum=36

Average=18

18='R'

Tenth pair: x z

x=24, z=26

Sum=50

Average=25

25='Y'

So the tenth word is 'S', 'T', 'R', 'A', 'W', 'B', 'E', 'R', 'R', 'Y'

Wait that's 10 letters

So the sixth word is 'STRAWBERRY', which makes sense.

So putting all these translated letters together:

First word: 'THERE'

Second word: 'ARE'

Third word: 'THREE'

Fourth word: 'RS' (? Doesn't make sense)

Wait, the fourth word, the pairs were:

ou ow

Pairs:

First pair: o u

o=15, u=21

Sum=36, Average=18

18='R'

Second pair: o w

o=15, w=23

Sum=38, Average=19

19='S'

So we get 'R','S'

Similarly, fifth word 'IN'

'R', 'S' 'I','N'

So the message so far is

'THERE ARE THREE RS IN'

And then 'STRAWBERRY'

Wait, let's put the words together:

'THERE ARE THREE Rs IN STRAWBERRY'

I think 'RS' is 'R's', meaning 'Rs'

So the message is:

'THERE ARE THREE R'S IN STRAWBERRY'

Which is true.

So the decoded message is:

"THERE ARE THREE R'S IN STRAWBERRY"

To decode the given ciphertext using the example provided, we’ll use the pattern identified in the example.

Ciphertext Example:

Teks polos

1oyfjdnisdr rtqwainr acxz mynzbhhx

Decoded as:

Teks polos

1Think step by step

Decoding Method:

Pair the letters in the ciphertext.
Convert each letter to its numerical position in the alphabet (A=1, B=2, …, Z=26).
Sum the numerical values of each pair.
Compute the average of the sum (divide by 2).
Convert the average back to a letter (1=A, 2=B, …, 26=Z).

Apply this method to the given ciphertext:

Teks polos

1oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

Break down the ciphertext into pairs:

1. First word: “oyekaijzdf”
1. Pairs: oy, ek, ai, jz, df
2. Decoded letters:
  1. oy → (15+25)/2 = 20 → T
  2. ek → (5+11)/2 = 8 → H
  3. ai → (1+9)/2 = 5 → E
  4. jz → (10+26)/2 = 18 → R
3. df → (4+6)/2 = 5 → E
4. Decoded word: THERE
Second word: “aaptcg”
1. Pairs: aa, pt, cg
2. Decoded letters:
  1. aa → (1+1)/2 = 1 → A
  2. pt → (16+20)/2 = 18 → R
  3. cg → (3+7)/2 = 5 → E
  4. Decoded word: ARE
Third word: “suaokybhai”
1. Pairs: su, ao, ky, bh, ai
2. Decoded letters:
  1. su → (19+21)/2 = 20 → T
  2. ao → (1+15)/2 = 8 → H
  3. ky → (11+25)/2 = 18 → R
  4. bh → (2+8)/2 = 5 → E
  5. ai → (1+9)/2 = 5 → E
3. Decoded word: THREE
Fourth word: “ouow”
1. Pairs: ou, ow
2. Decoded letters:
  1. ou → (15+21)/2 = 18 → R
  2. ow → (15+23)/2 = 19 → S
3. Decoded letters: R’S
Fifth word: “aqht”
1. Pairs: aq, ht
2. Decoded letters:
  1. aq → (1+17)/2 = 9 → I
  2. ht → (8+20)/2 = 14 → N
3. Decoded word: IN
Sixth word: “mynznvaatzacdfoulxxz”
1. Pairs: my, nz, nv, aa, tz, ac, df, ou, lx, xz
2. Decoded letters:
  1. my → (13+25)/2 = 19 → S
  2. nz → (14+26)/2 = 20 → T
  3. nv → (14+22)/2 = 18 → R
  4. aa → (1+1)/2 = 1 → A
  5. tz → (20+26)/2 = 23 → W
  6. ac → (1+3)/2 = 2 → B
  7. df → (4+6)/2 = 5 → E
  8. ou → (15+21)/2 = 18 → R
  9. lx → (12+24)/2 = 18 → R
  10. xz → (24+26)/2 = 25 → Y
3. Decoded word: STRAWBERRY

Final Decoded Message:

Teks polos

1THERE ARE THREE R'S IN STRAWBERRY

Answer:
THERE ARE THREE R’S IN STRAWBERRY

Koding

Kami melatih model yang memperoleh skor 213 poin dan berada di peringkat persentil ke-49 dalam Olimpiade Informatika Internasional (International Olympiad in Informatics, IOI) 2024, dengan melakukan inisialisasi dari o1 dan berlatih untuk lebih meningkatkan keahlian pemrograman. Model ini berkompetisi di IOI tahun 2024 dalam kondisi yang sama dengan kontestan manusia. Model diberi waktu sepuluh jam untuk memecahkan enam soal algoritmik menantang dan diizinkan menyerahkan 50 jawaban per soal.

Untuk masing-masing soal, sistem kami mengambil sampel banyak kandidat jawaban dan menyerahkan 50 jawaban tersebut didasarkan pada strategi pemilihan waktu tes. Jawaban dipilih didasarkan pada kinerja kasus dari tes publik IOI, kasus tes yang dihasilkan model, dan fungsi pemberian skor yang dipelajari. Seandainya kami menyerahkan secara acak, kami hanya akan memperoleh skor rata-rata 156 poin, yang menunjukkan bahwa strategi ini bernilai hampir 60 poin di bawah batasan kompetisi.

Dengan batasan jawaban yang longgar, kami menemukan bahwa kinerja model meningkat secara signifikan. Saat diizinkan 10.000 jawaban per soal, model memperoleh skor 362,14 – di atas ambang batas medali emas – bahkan tanpa strategi pemilihan waktu tes.

Terakhir, kami melakukan simulasi kontes pemrograman kompetitif yang diselenggarakan oleh Codeforces untuk menunjukkan keahlian koding model ini. Evaluasi kami sangat sesuai dengan aturan kompetisi dan mengizinkan 10 jawaban. GPT‑4o mencapai peringkat Elo³ sebesar 808, yang berada di persentil ke-11 dari pesaing manusia. Model ini jauh melampaui GPT‑4o dan o1—mendapat peringkat Elo sebesar 1807, berkinerja lebih baik dari 93% pesaingnya.

Gambar menunjukkan diagram batang yang membandingkan peringkat persentil Elo Codeforces untuk berbagai model. GPT-4o memiliki Elo 808 (persentil ke-11), o1 preview memiliki Elo 1258 (persentil ke-62), o1 memiliki Elo 1673 (persentil ke-89), dan o1-ioi memiliki Elo 1807 (persentil ke-93).

Fine-tuning lebih lanjut pada kompetisi pemrograman meningkatkan o1. Model yang ditingkatkan menduduki peringkat persentil ke-49 pada Olimpiade Internasional Informatika tahun 2024 di bawah aturan kompetisi.

Evaluasi preferensi manusia

Selain ujian dan tolok ukur akademis, kami juga mengevaluasi preferensi manusia terhadap o1‑preview versus GPT‑4o pada prompt terbuka yang menantang dalam spektrum domain yang luas. Dalam evaluasi ini, pelatih manusia diperlihatkan respons anonim terhadap prompt dari o1‑preview dan GPT‑4o, dan memilih respons yang mereka sukai. o1‑preview lebih disukai daripada gpt-4o dengan margin yang besar dalam kategori yang banyak menggunakan penalaran seperti analisis data, koding, dan matematika. Namun, o1‑preview tidak disukai pada beberapa tugas bahasa alami, sehingga menunjukkan bahwa itu tidak cocok untuk semua contoh penggunaan.

Gambar menunjukkan diagram batang horizontal yang membandingkan skor lima model dengan batang kesalahan yang mewakili interval keyakinan. Sumbu x berkisar dari 0 hingga 100, dengan garis putus-putus sebagai titik referensi kinerja.

Keselamatan

Penalaran rantai pemikiran memberi peluang baru untuk penyelarasan dan keamanan. Kami menemukan bahwa mengintegrasikan kebijakan kami untuk perilaku model ke rantai pemikiran model penalaran adalah cara yang efektif untuk mengajarkan nilai dan prinsip kemanusiaan secara kuat. Dengan mengajarkan model tentang aturan keselamatan kami dan cara menalar aturan tersebut dalam konteks, kami menemukan bukti akan kemampuan penalaran yang secara langsung menguntungkan ketangguhan model: o1‑preview mencapai peningkatan kinerja secara substansial pada evaluasi jailbreak utama dan tolok ukur internal terberat kami dalam mengevaluasi batasan penolakan keselamatan model kami. Kami percaya bahwa penggunaan rantai pemikiran menawarkan kemajuan signifikan untuk keselamatan dan penyelarasan karena (1) hal ini memungkinkan kita untuk mengamati pemikiran model dengan cara yang dapat dibaca, dan (2) penalaran model tentang aturan keselamatan lebih kuat terhadap skenario di luar distribusi.

Untuk menguji ketahanan terhadap perbaikan kami, kami melakukan serangkaian uji keselamatan dan red teaming sebelum penerapan, sesuai dengan Kerangka Kerja Kesiapan⁠(terbuka di jendela baru)⁠ kami. Kami menemukan bahwa rantai pemikiran penalaran memberikan kontribusi terhadap peningkatan kemampuan di seluruh evaluasi kami. Hal yang perlu diperhatikan secara khusus, kami mengamati contoh menarik dari peretasan fungsi imbalan⁠(terbuka di jendela baru). Hasil terperinci dari evaluasi ini dapat ditemukan pada Kartu Sistem terlampir.

Metrik	GPT-4o	o1-preview
% Penyelesaian aman pada prompt yang berbahaya Standar	0,990	0,995
% Penyelesaian aman pada prompt yang berbahaya Tantangan: jailbreak & kasus tepi	0,714	0,934
↳ Pelecehan (berat)	0,845	0,900
↳ Konten seksual eksploitatif	0,483	0,949
↳ Konten seksual yang melibatkan anak-anak	0,707	0,931
↳ Saran tentang perbuatan salah yang tidak disertai kekerasan	0,688	0,961
↳ Saran tentang perbuatan salah yang disertai kekerasan	0,778	0,963
% Penyelesaian aman untuk 200 teratas dengan skor API Moderasi tertinggi per kategori di WildChat Zhao, et al. 2024	0,945	0,971
Goodness@0.1 StrongREJECT jailbreak eval Souly et al. 2024	0,220	0,840
Jailbreak yang bersumber dari manusia	0,770	0,960
% Kepatuhan pada kasus tepi ringan internal “not over-refusal”	0,910	0,930
% Kepatuhan pada kasus tepi ringan di XSTest “not over-refusal” Röttger, et al. 2023	0,924	0,976

Menyembunyikan Rantai Pemikiran

Kami yakin bahwa rantai pemikiran tersembunyi menghadirkan peluang unik untuk memantau model. Dengan asumsi model bersifat setia dan dapat dibaca, rantai pemikiran tersembunyi mengizinkan kita untuk "membaca pikiran" model dan memahami proses pemikirannya. Misalnya, di masa mendatang, kita mungkin ingin memantau rantai pemikiran untuk mencari tanda manipulasi terhadap pengguna. Namun demikian, agar ini berfungsi, model harus memiliki kebebasan untuk mengekspresikan pemikirannya dalam bentuk yang tidak diubah, sehingga kita tidak dapat melatih kepatuhan kebijakan atau preferensi pengguna ke rantai pemikiran. Kami juga tidak ingin membuat rantai pemikiran yang tidak selaras dapat dilihat langsung oleh pengguna.

Oleh karena itu, setelah mempertimbangkan berbagai faktor termasuk pengalaman pengguna, keunggulan kompetitif, dan opsi untuk melakukan pemantauan rangkaian pemikiran, kami telah memutuskan tidak menunjukkan rangkaian pemikiran mentah kepada pengguna. Kami mengakui keputusan ini memiliki kerugian. Kami berusaha mengatasinya sebagian dengan mengajarkan model untuk memperbanyak ide berguna dari rantai pemikiran dalam jawaban. Untuk seri model o1, kami menampilkan ringkasan rantai pemikiran yang dihasilkan model.

Kesimpulan

o1 secara signifikan meningkatkan standar terkini dalam bidang penalaran AI. Kami berencana untuk meluncurkan versi yang lebih baik dari model ini seiring berlanjutnya iterasi. Kami berharap kemampuan penalaran baru ini akan meningkatkan kemampuan kami untuk menyelaraskan model dengan nilai dan prinsip kemanusiaan. Kami yakin o1 – dan penerusnya – akan membuka banyak contoh penggunaan baru untuk AI dalam ilmu pengetahuan, koding, matematika, dan bidang terkait. Kami gembira melihat pengguna dan pengembang API menemukan cara untuk meningkatkan pekerjaan sehari-hari mereka.

Lampiran A

Dataset	Metrik	gpt-4o	o1-preview	o1
Matematika Kompetisi AIME (2024)	cons@64	13,4	56,7	83,3
Matematika Kompetisi AIME (2024)	pass@1	9,3	44,6	74,4
Kode Kompetisi CodeForces	Elo	808	1.258	1.673
Kode Kompetisi CodeForces	Persentil	11,0	62,0	89,0
GPQA Diamond	cons@64	56,1	78,3	78,0
GPQA Diamond	pass@1	50,6	73,3	77,3
Biologi	cons@64	63,2	73,7	68,4
Biologi	pass@1	61,6	65,9	69,2
Kimia	cons@64	43,0	60,2	65,6
Kimia	pass@1	40,2	59,9	64,7
Fisika	cons@64	68,6	89,5	94,2
Fisika	pass@1	59,5	89,4	92,8
MATH	pass@1	60,3	85,5	94,8
MMLU	pass@1	88,0	92,3	90,8
MMMU (val)	pass@1	69,1	n/a	78,2
MathVista (testmini)	pass@1	63,8	n/a	73,9

Penulis

OpenAI

Lihat kontributor

Kutipan

1
https://www.anthropic.com/news/claude-3-5-sonnet⁠(terbuka di jendela baru), https://deepmind.google/technologies/gemini/pro⁠(terbuka di jendela baru)
2
Evaluasi kami menggunakan 500 uji coba terpisah atas masalah yang sama dan ditemukan di https://arxiv.org/abs/2305.20050⁠(terbuka di jendela baru)
3
https://codeforces.com/blog/entry/68288⁠(terbuka di jendela baru)