12. септембар 2024.

Learning to reason with LLMs

Учитавање…

OpenAI o1 se svrstava u 89. percentil na pitanjima iz kompetitivnog programiranja (Codeforces), nalazi se među 500 najboljih učenika u SAD na kvalifikacijama za Američku matematičku olimpijadu (AIME) i nadmašuje tačnost na nivou ljudskog doktora nauka na referentnom testu zadataka iz fizike, biologije i hemije (GPQA). Iako je rad potreban da bi ovaj novi model bio jednako jednostavan za upotrebu kao aktuelni modeli i dalje u toku, objavljujemo ranu verziju ovog modela, OpenAI o1‑preview, za neposrednu upotrebu u ChatGPT‑u i za pouzdane API korisnike⁠(отвара се у новом прозору).

Naš algoritam podstičajnog učenja velikih razmera uči model kako da produktivno razmišlja koristeći svoj lanac misli u procesu obuke koji je veoma efikasan u pogledu podataka. Otkrili smo da se performanse o1 dosledno poboljšavaju sa više podsticajnog učenja (računanje tokom obuke) i sa više vremena provedenog u razmišljanju (računanje u vreme testiranja). Ograničenja za skaliranje ovog pristupa znatno se razlikuju od onih kod pretreniranja LLM-ova i nastavljamo da ih istražujemo.

The image shows two scatter plots comparing "o1 AIME accuracy" during training and at test time. Both charts have "pass@1 accuracy" on the y-axis and compute (log scale) on the x-axis. The dots indicate increasing accuracy with more compute time.

o1 performance smoothly improves with both train-time and test-time compute

Evaluacije

Da bismo istakli poboljšanje u rezonovanju u odnosu na GPT‑4o, testirali smo naše modele na raznovrsnom skupu ispita za ljude i ML referentnih testova. Pokazujemo da o1 znatno nadmašuje GPT‑4o na великој већини ових задатака који се у великој мери ослањају на резоновање. Osim ako nije drugačije navedeno, procenjivali smo o1 u podešavanju sa maksimalnim računarskim resursima u vreme testiranja.

Евалуација такмичарске математике (AIME 2024) — o1 показује значајно унапређење у односу на GPT-4o на изазовним тестовима за резоновање. Пуне траке приказују pass@1 тачност, а осенчени регион приказује перформансе већинског гласања (консензуса) са 64 узорка.

Такмичарске евалуације за математику (AIME 2024), кодирање (CodeForces) и научна питања на нивоу докторских студија (GPQA Diamond) — o1 показује значајно унапређење у односу на GPT-4o на изазовним тестовима за резоновање. Пуне траке приказују pass@1 тачност, а осенчени регион приказује перформансе већинског гласања (консензуса) са 64 узорка.

Анализа тачности и сировог резултата GPT-4o у поређењу са o1 на различитим такмичарским евалуацијама — o1 је бољи од GPT-4o на широком спектру испитивања, укључујући 54/57 поткатегорија MMLU. Седам је приказано ради илустрације.

ML испитивања и резултати испита за gpt4o у односу на o1, побољшање на врху, алтернативни текст за мобилне уређаје — o1 је бољи од GPT-4o на широком спектру испитивања, укључујући 54/57 поткатегорија MMLU. Седам је приказано ради илустрације.

Na mnogim referentnim testovima koji se u velikoj meri oslanjaju na rezonovanje, o1 parira performansama ljudskih stručnjaka. Nedavni granični modeli¹ postižu tako dobre rezultate na MATH² i GSM8K da ovi referentni testovi više nisu efikasni za razlikovanje modela. Matematičke performanse procenjivali smo na AIME-u, ispitu osmišljenom da izazove najtalentovanije srednjoškolce iz matematike u Americi. Na AIME ispitima 2024, GPT‑4o je u proseku rešio samo 12% (1,8/15) zadataka. o1 je imao prosek od 74% (11,1/15) sa jednim uzorkom po zadatku, 83% (12,5/15) uz konsenzus među 64 uzorka i 93% (13,9/15) pri ponovnom rangiranju 1000 uzoraka pomoću naučene funkcije ocenjivanja. Rezultat od 13,9 svrstava ga među 500 najboljih učenika u zemlji i iznad praga za USA Mathematical Olympiad.

Takođe smo procenjivali o1 na GPQA diamond, teškom referentnom testu inteligencije koji proverava stručnost iz hemije, fizike i biologije. Da bismo uporedili modele sa ljudima, angažovali smo stručnjake sa doktoratom da odgovaraju na GPQA-diamond pitanja. Utvrdili smo da je o1 nadmašio performanse tih ljudskih stručnjaka, postavši prvi model kome je to pošlo za rukom na ovom referentnom testu. Ovi rezultati ne znače da je o1 sposobniji od doktora nauka u svakom pogledu — već samo da je model veštiji u rešavanju nekih problema koje bi se očekivalo da doktor nauka ume da reši. Na nekoliko drugih ML referentnih testova, o1 je unapredio stanje tehnike. Sa omogućenim sposobnostima vizuelne percepcije, o1 je postigao 78,2% na MMMU, čime je postao prvi model konkurentan ljudskim stručnjacima. Takođe je nadmašio GPT‑4o u 54 od 57 MMLU potkategorija.

Lanac misli

Slično tome kako čovek može dugo da razmišlja pre nego što odgovori na teško pitanje, o1 koristi lanac misli kada pokušava da reši problem. Kroz podsticajno učenje, o1 uči da izoštri svoj lanac misli i usavrši strategije koje koristi. Uči da prepozna i ispravi svoje greške. Uči da raščlani nezgodne korake na jednostavnije. Uči da pokuša drugačiji pristup kada trenutni ne funkcioniše. Ovaj proces dramatično poboljšava sposobnost modela za rezonovanje. Da bismo ilustrovali ovaj veliki iskorak, u nastavku prikazujemo lanac misli iz o1‑preview na nekoliko teških problema.

GPT-4o

Korisnik

oyfjdnisdr rtqwainr acxz mynzbhhx -> Think step by step

Use the example above to decode:

oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

Let’s break this down step by step based on the example:

1. Example given:
• Input: oyfjdnisdr rtqwainr acxz mynzbhhx
• Output: Think step by step

By examining the words:
• The pattern involves selecting specific letters or transforming them.

2. Now, let’s decode the new phrase:
• Input: oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

Step-by-step decoding process:

Identify possible word divisions.

The first phrase might follow the same structure as the example. It seems like the input text could be broken into groups based on natural separations or patterns.

A possible division could be:

oyekaijzdf
aaptcg
suaokybhai
ouow
aqht
mynznvaatzacdfoulxxz

However, to make a meaningful decoding, I would need more context about the transformations or letter shifting that might be involved.

Could you provide any additional decoding rules or transformations used in this cipher?

OpenAI o1-preview

Korisnik

oyfjdnisdr rtqwainr acxz mynzbhhx -> Think step by step

Use the example above to decode:

oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

Размишљао је 5 секунди

First, what is going on here?

We are given:

First, an example:

"oyfjdnisdr rtqwainr acxz mynzbhhx -> Think step by step"

We can see that when "oyfjdnisdr rtqwainr acxz mynzbhhx" is transformed to "Think step by step"

Our task is: Use the example above to decode: "oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz"

So the first part is to figure out how "oyfjdnisdr rtqwainr acxz mynzbhhx" is decoded into "Think step by step"

Observation 1: The phrase "Think step by step" is 18 letters (excluding spaces) . Let's count the letters.

Similarly, let's count the letters in "oyfjdnisdr rtqwainr acxz mynzbhhx"

Original Ciphertext:

oyfjdnisdr rtqwainr acxz mynzbhhx

Plaintext:

Think step by step

First, break down the letters:

Ciphertext letters: o y f j d n i s d r r t q w a i n r a c x z m y n z b h h x

Total letters: Let's count the letters in the ciphertext.

First word: oyfjdnisdr (10 letters)

Second word: rtqwainr (8 letters)

Third word: acxz (4 letters)

Fourth word: mynzbhhx (8 letters)

Total letters in ciphertext: 10 + 8 + 4 + 8 = 30 letters

Plaintext: Think step by step

First word: Think (5 letters)

Second word: step (4 letters)

Third word: by (2 letters)

Fourth word: step (4 letters)

Total letters in plaintext: 5 + 4 + 2 + 4 = 15 letters

Hmm.

But actually in the problem it says the example:

[Ciphertext] -> Think step by step

So is the decoding process that the ciphertext maps to "Think step by step"?

Our goal is to use the example above to decode:

"oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz"

First, the cipher seems connected to the plaintext.

Given the time constraints, perhaps the easiest way is to try to see patterns.

Option 1: Try to find mapping from letters to letters.

Do any letters match?

First, let's write down the ciphertext and plaintext letters on top of each other.

Ciphertext: o y f j d n i s d r r t q w a i n r a c x z m y n z b h h x

Plaintext: T h i n k s t e p b y s t e p

Wait a minute.

I think maybe there is an anagram or substitution cipher here.

Alternatively, I think that we can notice that each group of ciphertext corresponds to a plaintext word.

Check the number of letters.

First ciphertext word: oyfjdnisdr (10 letters)

First plaintext word: Think (5 letters)

Hmm.

Second ciphertext word: rtqwainr (8 letters)

Second plaintext word: step (4 letters)

Third ciphertext word: acxz (4 letters)

Third plaintext word: by (2 letters)

Fourth ciphertext word: mynzbhhx (8 letters)

Fourth plaintext word: step (4 letters)

Interesting.

It seems that the ciphertext words are exactly twice as long as the plaintext words.

(10 vs 5, 8 vs 4, 4 vs 2, 8 vs 4)

Idea: Maybe we need to take every other letter or rebuild the plaintext from the ciphertext accordingly.

Let's test this theory.

If we take every other letter from the ciphertext words, maybe we'll get the plaintext.

Let's try this for the first word:

Ciphertext word: o y f j d n i s d r (10 letters)

Alternate letters:

First letters (positions 1,3,5,7,9): o f d i d

Second letters (positions 2,4,6,8,10): y j n s r

Wait: Let's try taking every second letter or first letter.

Option 1: Take the odd positions:

Positions: 1 3 5 7 9

Letters: o f d i d

Plaintext word is "Think", letters are T h i n k

Hmm.

Perhaps we can map letters:

o f d i d --> Think

Not directly

Option 2: Try mapping as per an assigned code: perhaps columns of letters?

Alternatively, perhaps the cipher is more complex.

Alternatively, notice that "oyfjdnisdr" has 10 letters and "Think" has 5 letters.

What if each plaintext letter is being represented by two ciphertext letters?

Similarly for other words.

Suppose that each plaintext letter corresponds to pair of letters in the ciphertext.

Meaning that the first two letters of ciphertext map to the first plaintext letter, and so on.

So for the first word, partition the ciphertext letters into pairs:

oy fj dn is dr

Corresponding to plaintext letters T h i n k

So we can attempt to map:

oy -> T

fj -> h

dn -> i

is -> n

dr -> k

If this is the mapping, perhaps each pair of letters maps to one plaintext letter.

Similarly for other words.

Let's test this mapping.

Is there a consistent mapping from pairs to letters?

Let's make a table of pairs and corresponding letters.

First, make the pairs from the first word and their corresponding plaintext letters:

Ciphertext Pairs -> Plaintext Letter

oy -> T

fj -> h

dn -> i

is -> n

dr -> k

Now, what's the mapping from "oy" to 'T'?

Is there a way to map 'oy' to 'T'?

Similarly for others.

One possible way is that 'oy' represents 'T'.

But how?

Option 1: Sum of the letters.

Convert letters to numbers (A=1, B=2, ..., Z=26)

So let's try to get to 'T' from 'o' and 'y'

o = 15

y = 25

Sum: 15 +25 = 40

But 'T' is 20.

Alternatively, perhaps subtract: 25 -15 = 10.

No.

Alternatively, perhaps combine the numbers in some way.

Alternatively, think about their positions in the alphabet.

Alternatively, perhaps the letters are encrypted via a code.

Alternatively, perhaps if we overlay the word 'Think' over the cipher pairs 'oy', 'fj', etc., the cipher is formed by substituting each plaintext letter with two letters.

Alternatively, perhaps consider the 'original' letters.

Let's look at the letters between the word "Think" and "oyfjdnisdr"

Perhaps we can spot that "oyfjdnisdr" is an anagram for "Thinkfiles"?

But that seems unlikely.

Alternatively, think of the pairs.

Suppose we assign numbers to letters: a=1, b=2, ..., z=26

Consider the numbers in 'oy':

o = 15

y = 25

Add them: 15 + 25 = 40

Now, for 'T' we have:

'T' is number 20

What if we take the sum of the cipher pair mod 26.

15 +25 = 40

Then 40 mod 26 = 14

14 corresponds to 'N'

Doesn't match.

Alternatively, see if the cipher pairs correspond to something else.

Alternatively, perhaps the pair 'oy' corresponds to 'T' via key.

Alternatively, perhaps it's as simple as 'oy' is 'T'.

But 'o' corresponds to 'T', subtracting 'o' ASCII code from 'T'.

Now, let's think in terms of symmetric letters.

Alternatively, perhaps we can spot that the letters correspond to the first letters of words.

But perhaps an easier way is to look at the word positions.

Alternatively, perhaps we can see that the ciphertext letters are offsets from the plaintext letters.

So let's try to map the pairs to the plaintext letters.

First pair 'oy' and plaintext letter 'T'

Letter 'o' is 15, 'T' is 20

Letter 'o' + 5 = 'T' (15 +5=20)

But 'y' is 25, which is 5 letters after 'T' (20 +5=25)

Interesting, seems that 'o' +5 = 'T'

Similarly, 'y' +1 = 'Z', which is not 'T'.

Alternatively, maybe the average of the two letters corresponds to 'T'

(15 +25)/2 =20, which is 'T'

Wait a minute, that seems promising.

First pair: 'o' (15) + 'y' (25) = 40

40 /2 =20

20 corresponds to 'T'

So perhaps the average of the letters in the pair corresponds to the plaintext letter.

Let's check this with the second pair.

Second pair: 'fj' corresponding to 'h'

Letters 'f'=6, 'j'=10

Sum: 6+10=16

Average:16/2=8

8 corresponds to 'h' (since 'h' is 8)

Success!

Third pair: 'dn' to 'i'

'd'=4, 'n'=14

Sum:4+14=18

Average:18/2=9

9 corresponds to 'i'(9='i')

But 'i' is 9, so that seems off by 1.

So perhaps we need to think carefully about letters.

Wait, 18/2=9, 9 corresponds to 'I'

So this works.

Fourth pair: 'is' corresponding to 'n'

'i'=9, 's'=19

Sum:9+19=28

Average:28/2=14

14 corresponds to 'n'(14='N')

Yes!

Fifth pair: 'dr' corresponds to 'k'

'd'=4, 'r'=18

Sum:4+18=22

Average:22/2=11

11 corresponds to 'k'(11='K')

Perfect!

So our code is: For each pair, sum their numeric values, divide by 2, get the corresponding letter.

Alternatively, the average of the numeric values of the letters corresponds to the plaintext letter.

This seems to be a valid method.

Now let's test this with the other words.

Second word ciphertext: rtqwainr

Ciphertext pairs:

rt -> s

qw -> ?

ai -> ?

nr -> ?

Corresponding to plaintext letters 's', 't', 'e', 'p'

So let's take the pairs:

1. 'rt' corresponds to 's'

r=18, t=20

Sum:38

Average:19

19 corresponds to 's' (19='S')

Good

2. 'qw' corresponds to 't'

q=17, w=23

Sum:17+23=40

Average:20

20 corresponds to 't' (20='T')

Good

3. 'ai' corresponds to 'e'

a=1, i=9

Sum:10

Average:5

5 corresponds to 'e' (5='E')

Good

4. 'nr' corresponds to 'p'

n=14, r=18

Sum:32

Average:16

16 corresponds to 'p' (16='P')

Perfect!

Third word: acxz (4 letters)

Plaintext word: 'by' (2 letters)

Ciphertext pairs:

a c

x z

Plaintext letters: 'b', 'y'

First pair: 'a', 'c'

a=1, c=3

Sum:4

Average:2

2 corresponds to 'b'(2='B')

Second pair: 'x' and 'z'

x=24, z=26

Sum:50

Average:25

25 corresponds to 'y'(25='Y')

Yes!

Fourth word: mynzbhhx (8 letters)

Plaintext word: 'step' (4 letters)

Ciphertext pairs:

my

nz

bh

hx

Plaintext letters 's','t','e','p'

First pair 'm','y'

m=13, y=25

Sum:38

Average:19

19 corresponds to 's'(19='S')

Second pair: 'n','z'

n=14, z=26

Sum: 40

Average:20

20='t'

Third pair: 'b','h'

b=2, h=8

Sum:10

Average:5

5='e'

Fourth pair: 'h','x'

h=8, x=24

Sum:32

Average:16

16='p'

Got it!

Now, given this, we can decode the given message.

Ciphertext to decode:

"oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz"

First, we need to divide into words.

Given that the ciphertext in the example had words of length:

-10 letters (5 pairs)
-8 letters (4 pairs)
-4 letters (2 pairs)
-8 letters (4 pairs)

Transported to plaintext words of:

-5 letters
-4 letters
-2 letters
-4 letters

So perhaps in our supplied ciphertext, we need to divide the letters into words accordingly.

Ciphertext: o y e k a i j z d f a a p t c g s u a o k y b h a i o u o w a q h t m y n z n v a a t z a c d f o u l x x z

First, let's write the letters in order:

1) oyekaijzdf (10 letters) -> word of 5 letters

2) aaptcg (6 letters) -> word of 3 letters

3) suaokybhai (10 letters) -> word of 5 letters

4) ouow (4 letters) -> word of 2 letters

5) aqht (4 letters) -> word of 2 letters

6) mynznvaatzacdfoulxxz (22 letters) -> word of 11 letters

Totals:

So the letters are:

First word: oyekaijzdf (10 letters = 5 pairs)

Second word: aaptcg (6 letters=3 pairs)

Third word: suaokybhai (10 letters=5 pairs)

Fourth word: ouow (4 letters=2 pairs)

Fifth word: aqht (4 letters=2 pairs)

Sixth word: mynznvaatzacdfoulxxz (22 letters=11 pairs)

So we can suppose that the plaintext words have lengths:

1) 5 letters

2) 3 letters

3) 5 letters

4) 2 letters

5) 2 letters

6)11 letters

Now we can proceed to decode.

First word: oyekaijzdf

Divide into pairs:

oy

ek

ai

jz

df

So we can find the plaintext letters by averaging the numeric values of the letters in each pair.

First pair: 'o','y'

o=15, y=25

Sum=40

Average=20

20='T'

Second pair: 'e','k'

e=5, k=11

Sum=16

Average=8

8='H'

Third pair: 'a','i'

a=1, i=9

Sum=10

Average=5

5='E'

Fourth pair: 'j','z'

j=10, z=26

Sum=36

Average=18

18='R'

Fifth pair: 'd','f'

d=4, f=6

Sum=10

Average=5

5='E'

So the first word is 'T','H','E','R','E'

So 'THERE'

Second word: aaptcg

Pairs:

aa

pt

cg

First pair: 'a','a'

a=1, a=1

Sum=2

Average=1

1='A'

Second pair: 'p','t'

p=16, t=20

Sum=36

Average=18

18='R'

Third pair: 'c','g'

c=3, g=7

Sum=10

Average=5

5='E'

So second word: 'A','R','E'

Third word: suaokybhai

Pairs:

su

ao

ky

bh

ai

First pair: s=19, u=21

Sum=40, Average=20

20='T'

Second pair: a=1, o=15

Sum=16, Average=8

8='H'

Third pair: k=11, y=25

Sum=36, Average=18

18='R'

Fourth pair: b=2, h=8

Sum=10, Average=5

5='E'

Fifth pair: a=1, i=9

Sum=10, Average=5

5='E'

Third word: 'T','H','R','E','E'

Fourth word: ouow

Pairs: ou, ow

First pair: o=15, u=21

Sum=36, Average=18

18='R'

Second pair: o=15, w=23

Sum=38, Average=19

19='S'

Fourth word: 'R','S'

Fifth word: aqht

Pairs: a q, h t

First pair: a=1, q=17

Sum=18, Average=9

9='I'

Second pair: h=8, t=20

Sum=28, Average=14

14='N'

Fifth word: 'I','N'

Sixth word: mynznvaatzacdfoulxxz (22 letters: 11 pairs)

Pairs:

m y

n z

n v

a a

t z

a c

d f

o u

l x

x z

Let's list them properly.

Wait, earlier I missed some letters there.

Let's re-express the sixth word letters:

m y n z n v a a t z a c d f o u l x x z

Total letters: 22 letters

Pairs:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z

Wait, that's 21 letters.

But in total, we have 22 letters; I think I have a mismatch.

Wait, in the problem the sixth word is:

mynznvaatzacdfoulxxz

Let's separate the letters:

m y n z n v a a t z a c d f o u l x x z

Total letters: 22 letters

So pairs would be:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z

Wait, 'z' is only one letter at the end.

So maybe the letters are:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z (left alone?)

But then we're missing a letter.

Alternatively, perhaps the 'z' at the end is an extra letter or it's paired with the previous 'x'.

Alternatively, perhaps there is a mistake.

Alternatively, perhaps when we reach the last 'x x z', we should write:

9. l x

10. x z

Total pairs: 11

So updated pairs:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x z

Now total pairs: 10 pairs. But we are supposed to have 11 letters in the plaintext word.

So perhaps I have miscounted and not considered including 'f' or 'z'.

Wait, is there any chance that the sequence is:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z

We have 22 letters.

Perhaps the 'x x' and 'z' are overlapping.

But we need 11 pairs.

I think for this to work, we need to create 11 pairs from the 22 letters.

So let's try pairing letters in order, taking two letters at a time.

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z ?

But again, last pair only has one letter 'z'

Alternatively, we might need to adjust the pairs.

Perhaps it's 'ou', 'lx', 'xz', and there is no extra 'x'.

Maybe I miswrote.

Wait, let's write the letters as they come:

m y n z n v a a t z a c d f o u l x x z

Positions:

1 m

2 y

3 n

4 z

5 n

6 v

7 a

8 a

9 t

10 z

11 a

12 c

13 d

14 f

15 o

16 u

17 l

18 x

19 x

20 z

So 20 letters.

Now that makes sense.

Now we can divide into pairs:

Pairs:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x z

Total of 10 pairs, which will correspond to 10 letters.

But earlier we thought the word was 11 letters.

But 10 letters may be acceptable.

Maybe we should proceed with the 10 pairs.

First pair: m y

m=13, y=25

Sum=38

Average=19

19='S'

Second pair: n z

n=14, z=26

Sum=40

Average=20

20='T'

Third pair: n v

n=14, v=22

Sum=36

Average=18

18='R'

Fourth pair: a a

a=1, a=1

Sum=2

Average=1

1='A'

Fifth pair: t z

t=20, z=26

Sum=46

Average=23

23='W'

Sixth pair: a c

a=1, c=3

Sum=4

Average=2

2='B'

Seventh pair: d f

d=4, f=6

Sum=10

Average=5

5='E'

Eighth pair: o u

o=15, u=21

Sum=36

Average=18

18='R'

Ninth pair: l x

l=12, x=24

Sum=36

Average=18

18='R'

Tenth pair: x z

x=24, z=26

Sum=50

Average=25

25='Y'

So the tenth word is 'S', 'T', 'R', 'A', 'W', 'B', 'E', 'R', 'R', 'Y'

Wait that's 10 letters

So the sixth word is 'STRAWBERRY', which makes sense.

So putting all these translated letters together:

First word: 'THERE'

Second word: 'ARE'

Third word: 'THREE'

Fourth word: 'RS' (? Doesn't make sense)

Wait, the fourth word, the pairs were:

ou ow

Pairs:

First pair: o u

o=15, u=21

Sum=36, Average=18

18='R'

Second pair: o w

o=15, w=23

Sum=38, Average=19

19='S'

So we get 'R','S'

Similarly, fifth word 'IN'

'R', 'S' 'I','N'

So the message so far is

'THERE ARE THREE RS IN'

And then 'STRAWBERRY'

Wait, let's put the words together:

'THERE ARE THREE Rs IN STRAWBERRY'

I think 'RS' is 'R's', meaning 'Rs'

So the message is:

'THERE ARE THREE R'S IN STRAWBERRY'

Which is true.

So the decoded message is:

"THERE ARE THREE R'S IN STRAWBERRY"

To decode the given ciphertext using the example provided, we’ll use the pattern identified in the example.

Ciphertext Example:

Обичан текст

1oyfjdnisdr rtqwainr acxz mynzbhhx

Decoded as:

Обичан текст

1Think step by step

Decoding Method:

Pair the letters in the ciphertext.
Convert each letter to its numerical position in the alphabet (A=1, B=2, …, Z=26).
Sum the numerical values of each pair.
Compute the average of the sum (divide by 2).
Convert the average back to a letter (1=A, 2=B, …, 26=Z).

Apply this method to the given ciphertext:

Обичан текст

1oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

Break down the ciphertext into pairs:

1. First word: “oyekaijzdf”
1. Pairs: oy, ek, ai, jz, df
2. Decoded letters:
  1. oy → (15+25)/2 = 20 → T
  2. ek → (5+11)/2 = 8 → H
  3. ai → (1+9)/2 = 5 → E
  4. jz → (10+26)/2 = 18 → R
3. df → (4+6)/2 = 5 → E
4. Decoded word: THERE
Second word: “aaptcg”
1. Pairs: aa, pt, cg
2. Decoded letters:
  1. aa → (1+1)/2 = 1 → A
  2. pt → (16+20)/2 = 18 → R
  3. cg → (3+7)/2 = 5 → E
  4. Decoded word: ARE
Third word: “suaokybhai”
1. Pairs: su, ao, ky, bh, ai
2. Decoded letters:
  1. su → (19+21)/2 = 20 → T
  2. ao → (1+15)/2 = 8 → H
  3. ky → (11+25)/2 = 18 → R
  4. bh → (2+8)/2 = 5 → E
  5. ai → (1+9)/2 = 5 → E
3. Decoded word: THREE
Fourth word: “ouow”
1. Pairs: ou, ow
2. Decoded letters:
  1. ou → (15+21)/2 = 18 → R
  2. ow → (15+23)/2 = 19 → S
3. Decoded letters: R’S
Fifth word: “aqht”
1. Pairs: aq, ht
2. Decoded letters:
  1. aq → (1+17)/2 = 9 → I
  2. ht → (8+20)/2 = 14 → N
3. Decoded word: IN
Sixth word: “mynznvaatzacdfoulxxz”
1. Pairs: my, nz, nv, aa, tz, ac, df, ou, lx, xz
2. Decoded letters:
  1. my → (13+25)/2 = 19 → S
  2. nz → (14+26)/2 = 20 → T
  3. nv → (14+22)/2 = 18 → R
  4. aa → (1+1)/2 = 1 → A
  5. tz → (20+26)/2 = 23 → W
  6. ac → (1+3)/2 = 2 → B
  7. df → (4+6)/2 = 5 → E
  8. ou → (15+21)/2 = 18 → R
  9. lx → (12+24)/2 = 18 → R
  10. xz → (24+26)/2 = 25 → Y
3. Decoded word: STRAWBERRY

Final Decoded Message:

Обичан текст

1THERE ARE THREE R'S IN STRAWBERRY

Answer:
THERE ARE THREE R’S IN STRAWBERRY

Kodiranje

Obučili smo model koji je osvojio 213 poena i rangirao se u 49. percentilu na Međunarodnoj olimpijadi iz informatike (IOI) 2024, tako što smo krenuli od o1 i dalje ga obučavali da dodatno unapredi veštine programiranja. Ovaj model se takmičio na IOI 2024 pod istim uslovima kao i ljudski takmičari. Imao je deset sati da reši šest izazovnih algoritamskih problema i bilo mu je dozvoljeno 50 predaja po problemu.

Za svaki problem, naš sistem je uzorkovao mnogo kandidatskih predaja i poslao 50 njih na osnovu strategije odabira u vreme testiranja. Predaje su birane na osnovu uspeha na javnim IOI test primerima, test primerima koje je generisao model i naučenoj funkciji ocenjivanja. Da smo umesto toga predavali nasumično, u proseku bismo osvojili samo 156 poena, što ukazuje da je ova strategija vredela gotovo 60 poena u uslovima takmičenja.

Uz ublaženo ograničenje broja predaja, utvrdili smo da su se performanse modela značajno poboljšale. Kada mu je bilo dozvoljeno 10.000 predaja po problemu, model je ostvario rezultat od 362,14 — iznad praga za zlatnu medalju — čak i bez ikakve strategije odabira u vreme testiranja.

Na kraju smo simulirali takmičenja u kompetitivnom programiranju koja organizuje Codeforces da bismo prikazali veštinu kodiranja ovog modela. Naše evaluacije su blisko pratile pravila takmičenja i dozvoljavale su 10 predaja. GPT‑4o je postigao Elo rejting³ od 808, što ga svrstava u 11. percentil ljudskih takmičara. Ovaj model je daleko nadmašio i GPT‑4o i o1 — postigao je Elo rejting od 1807 i bio bolji od 93% takmičara.

The image shows a bar chart comparing Codeforces Elo percentile rankings for different models. GPT-4o has 808 Elo (11th percentile), o1 preview has 1258 Elo (62nd percentile), o1 has 1673 Elo (89th percentile), and o1-ioi has 1807 Elo (93rd percentile).

Further fine-tuning on programming competitions improves o1. The improved model ranked in the 49th percentile in the 2024 International Olympiad in Informatics under competition rules.

Evaluacija ljudskih preferencija

Pored ispita i akademskih referentnih testova, procenjivali smo i ljudske preference za o1‑preview u odnosu na GPT‑4o na izazovnim, otvorenim instrukcijama u širokom spektru domena. U ovoj evaluaciji, ljudskim trenerima su prikazivani anonimizovani odgovori na instrukciju od o1‑preview i GPT‑4o, i glasali su za odgovor koji preferiraju. o1‑preview se s velikom razlikom preferira u odnosu na gpt-4o u kategorijama koje se u velikoj meri oslanjaju na rezonovanje, kao što su analiza podataka, kodiranje i matematika. Međutim, o1‑preview nije preferiran na nekim zadacima sa prirodnim jezikom, što ukazuje na to da nije prikladan za sve slučajeve upotrebe.

Bezbednost

Rezonovanje putem lanca misli pruža nove mogućnosti za usklađivanje i bezbednost. Otkrili smo da je integrisanje naših politika za ponašanje modela u lanac misli modela rezonovanja efikasan način da se ljudske vrednosti i principi pouzdano poduče. Učeći model našim bezbednosnim pravilima i kako da rezonuje o njima u kontekstu, pronašli smo dokaze da sposobnost rezonovanja direktno doprinosi robusnosti modela: o1‑preview je ostvario znatno bolje rezultate na ključnim evaluacijama jailbreak-a i na našim najtežim internim referentnim testovima za procenu granica bezbednosnog odbijanja našeg modela. Verujemo da upotreba lanca misli donosi značajan napredak za bezbednost i usklađivanje jer (1) nam omogućava da posmatramo razmišljanje modela na čitljiv način i (2) rezonovanje modela o bezbednosnim pravilima je robusnije u scenarijima van distribucije.

Da bismo opteretili naša poboljšanja, sproveli smo skup bezbednosnih testova i red-teaming pre objavljivanja, u skladu sa našim Okvirom spremnosti⁠(отвара се у новом прозору). Utvrdili smo da je rezonovanje putem lanca misli doprinelo poboljšanju sposobnosti u svim našim evaluacijama. Posebno je važno to što smo uočili zanimljive primere iskorišćavanja nagrade⁠(отвара се у новом прозору). Detaljni rezultati ovih evaluacija mogu se naći u pratećoj sistemskoj kartici.

Метрички	GPT-4o	o1-preview
Проценат безбедних извршавања за штетне инструкције Стандард	0,990	0,995
%Безбедна извршавања за штетне инструкције Изазовно: jailbreak-ови и гранични случајеви	0,714	0,934
↳ Узнемиравање (озбиљно)	0,845	0,900
↳ Експлоатациони сексуални садржај	0,483	0,949
↳ Сексуални садржај који укључује малолетнике	0,707	0,931
↳ Савети о ненасилним неисправним поступцима	0,688	0,961
↳ Савети о ненасилним неисправним поступцима	0,778	0,963
% Safe completions for top 200 with highest Moderation API scores per category in WildChat Zhao, et al. 2024	0,945	0,971
Goodness@0.1 StrongREJECT jailbreak eval Souly et al. 2024	0,220	0,840
Евал. jailbreak-а из људских извора	0,770	0,960
%Усклађеност у интерним безопасним граничним случајевима „није претерано одбијање”	0,910	0,930
% Усклађеност у бенигним граничним случајевима у XSTest „непретерано одбијање” Röttger, et al. 2023	0,924	0,976

Skrivanje lanaca misli

Verujemo da skriveni lanac misli predstavlja jedinstvenu priliku za nadzor modela. Pod pretpostavkom da je veran i čitljiv, skriveni lanac misli nam omogućava da „čitamo misli“ modela i razumemo njegov misaoni proces. Na primer, u budućnosti bismo možda želeli da pratimo lanac misli radi znakova manipulisanja korisnikom. Međutim, da bi ovo funkcionisalo, model mora imati slobodu da izrazi svoje misli u neizmenjenom obliku, tako da ne možemo obučavati bilo kakvu usklađenost sa politikama ili korisničke preference na lanac misli. Takođe ne želimo da neusklađen lanac misli učinimo direktno vidljivim korisnicima.

Zato smo, nakon razmatranja više faktora uključujući korisničko iskustvo, konkurentsku prednost i mogućnost praćenja lanca misli, odlučili da korisnicima ne prikazujemo sirove lance misli. Svesni smo da ova odluka ima nedostatke. Trudimo se da to delimično nadoknadimo tako što učimo model da u odgovoru reprodukuje sve korisne ideje iz lanca misli. Za seriju modela o1 prikazujemo sažetak lanca misli koji generiše model.

Zaključak

o1 značajno unapređuje stanje tehnike u AI rezonovanju. Planiramo da objavimo poboljšane verzije ovog modela dok nastavljamo da ga usavršavamo. Očekujemo da će ove nove sposobnosti rezonovanja poboljšati našu sposobnost da uskladimo modele sa ljudskim vrednostima i principima. Verujemo da će o1 — i njegovi naslednici — otvoriti mnoge nove slučajeve upotrebe za AI u nauci, kodiranju, matematici i srodnim oblastima. Radujemo se što će korisnici i API programeri otkriti kako može da unapredi njihov svakodnevni rad.

Dodatak A

Скуп података	Метрички	gpt-4o	o1-preview	o1
Такмичарска математика AIME (2024)	cons@64	13,4	56,7	83,3
Такмичарска математика AIME (2024)	pass@1	9,3	44,6	74,4
Код такмичења CodeForces	Elo	808	1.258	1.673
Код такмичења CodeForces	Перцентил	11,0	62,0	89,0
GPQA Diamond	cons@64	56,1	78,3	78,0
GPQA Diamond	pass@1	50,6	73,3	77,3
Биологија	cons@64	63,2	73,7	68,4
Биологија	pass@1	61,6	65,9	69,2
Хемија	cons@64	43,0	60,2	65,6
Хемија	pass@1	40,2	59,9	64,7
Физика	cons@64	68,6	89,5	94,2
Физика	pass@1	59,5	89,4	92,8
МАТЕМАТИКА	pass@1	60,3	85,5	94,8
MMLU	pass@1	88,0	92,3	90,8
MMMU (вред.)	pass@1	69,1	није применљиво	78,2
MathVista (testmini)	pass@1	63,8	није применљиво	73,9

Autori

OpenAI

Prikaži saradnike

Citati

1
https://www.anthropic.com/news/claude-3-5-sonnet⁠(отвара се у новом прозору), https://deepmind.google/technologies/gemini/pro⁠(отвара се у новом прозору)
2
Naše evaluacije su koristile istu test podelu od 500 zadataka koja se nalazi u https://arxiv.org/abs/2305.20050⁠(отвара се у новом прозору)
3
https://codeforces.com/blog/entry/68288⁠(отвара се у новом прозору)