2024 ж. 12 қыркүйек

Learning to reason with LLMs

Жүктелуде…

OpenAI o1 жарыстық бағдарламалау сұрақтарында (Codeforces) 89-процентильге енеді, АҚШ-тағы Математика олимпиадасына (AIME) іріктеу емтиханында АҚШ бойынша үздік 500 оқушының қатарына кіреді және физика, биология және химия есептеріне арналған бенчмаркте (GPQA) PhD деңгейіндегі адамның дәлдігінен асады. Бұл жаңа модельді қазіргі модельдер сияқты қолдануға жеңіл ету үшін қажет жұмыс әлі жалғасып жатқанына қарамастан, біз бұл модельдің ерте нұсқасы — OpenAI o1‑preview‑ді ChatGPT ішінде дереу пайдалану үшін және сенімді API пайдаланушыларына⁠(жаңа терезеде ашылады) ұсынып отырмыз.

Біздің ауқымды күшейтпелі оқыту алгоритмі модельге өз ой тізбегін пайдаланып өнімді ойлауды, дерек жағынан өте тиімді жаттықтыру үдерісі арқылы үйретеді. Біз o1 өнімділігі көбірек күшейтпелі оқыту арқылы (жаттықтыру кезіндегі есептеу) және ойлануға көбірек уақыт бөлу арқылы (тест кезіндегі есептеу) тұрақты түрде жақсаратынын анықтадық. Бұл тәсілді ауқымдаудың шектеулері LLM-ді алдын ала жаттықтыру шектеулерінен едәуір өзгеше, және біз оларды зерттеуді жалғастырып жатырмыз.

The image shows two scatter plots comparing "o1 AIME accuracy" during training and at test time. Both charts have "pass@1 accuracy" on the y-axis and compute (log scale) on the x-axis. The dots indicate increasing accuracy with more compute time.

o1 performance smoothly improves with both train-time and test-time compute

Бағалаулар

GPT‑4o‑мен салыстырғанда ой қорыту сапасының жақсарғанын көрсету үшін, біз модельдерімізді адамға арналған түрлі емтихандар мен ML бенчмарктерінің әртүрлі жиынтығында сынадық. Біз o1‑дің ой қорытуға көбірек сүйенетін осы тапсырмалардың басым көпшілігінде GPT‑4o‑дан айтарлықтай озып түсетінін көрсеттік. Басқаша көрсетілмесе, o1‑ді тест кезіндегі есептеудің ең жоғарғы параметрінде бағаладық.

Жарыстық математика бағалауы (AIME 2024) — o1 күрделі ой қорыту көрсеткіштерінде GPT-4o-дан айтарлықтай жақсырақ. Тұтас жолақтар pass@1 дәлдігін көрсетеді, ал көлеңкеленген аймақ 64 үлгімен көпшілік дауыстың (консенсус) нәтижесін көрсетеді.

Математика (AIME 2024), код (CodeForces) және PhD деңгейіндегі ғылым сұрақтары (GPQA Diamond) бойынша жарыстық бағалаулар — o1 күрделі ой қорыту көрсеткіштерінде GPT-4o-дан айтарлықтай жақсырақ. Тұтас жолақтар pass@1 дәлдігін көрсетеді, ал көлеңкеленген аймақ 64 үлгімен көпшілік дауыстың (консенсус) нәтижесін көрсетеді.

Әртүрлі жарыс бағалау тестілерінде gpt-4o мен o1 модельдерінің дәлдігі мен бастапқы ұпайларының бөлінісі — o1 көптеген бенчмарк бойынша, соның ішінде MMLU-дың 54/57 ішкі санаты бойынша, GPT-4o-дан асып түседі. Жетеуі мысал ретінде көрсетілген.

gpt4o мен o1 үшін ML көрсеткіштері және емтихан ұпайлары, жоғарғы жақтағы жақсарту, мобильдіге арналған alt мәтін — o1 көптеген бенчмарк бойынша, соның ішінде MMLU-дың 54/57 ішкі санаты бойынша, GPT-4o-дан асып түседі. Жетеуі мысал ретінде көрсетілген.

Ой қорытуға көбірек сүйенетін көптеген бенчмарктерде o1 адам сарапшыларының нәтижесімен тайталасады. Соңғы озық модельдер¹ MATH² және GSM8K бойынша соншалықты жақсы нәтиже көрсетеді, бұл бенчмарктер енді модельдерді ажыратуда тиімді емес. Біз математикалық өнімділікті Америкадағы ең мықты мектеп оқушыларына қиындық тудыру үшін жасалған AIME емтиханында бағаладық. 2024 жылғы AIME емтихандарында GPT‑4o есептердің орта есеппен тек 12%-ын (1.8/15) ғана шешті. Ал o1 әр есепке бір үлгімен орташа 74% (11.1/15), 64 үлгі арасындағы консенсуспен 83% (12.5/15) және үйретілген бағалау функциясымен 1000 үлгіні қайта ранжирлеу кезінде 93% (13.9/15) нәтижеге жетті. 13.9 балл оны ел бойынша үздік 500 оқушының қатарына енгізіп, USA Mathematical Olympiad өту шегінен жоғары қояды.

Біз сондай-ақ o1‑ді GPQA diamond бойынша бағаладық — бұл химия, физика және биология салаларындағы сараптаманы тексеретін күрделі интеллект бенчмаркі. Модельдерді адамдармен салыстыру үшін біз GPQA-diamond сұрақтарына жауап беру мақсатында PhD дәрежесі бар сарапшыларды тарттық. Нәтижесінде o1 сол адам сарапшылардың нәтижесінен асып түсті және бұл бенчмаркте мұны жасаған алғашқы модель болды. Бұл нәтижелер o1 барлық қырынан PhD-ден қабілетті дегенді білдірмейді — тек модель PhD шеше алады деп күтілетін кейбір есептерді шешуде анағұрлым біліктірек екенін көрсетеді. Тағы бірнеше ML бенчмарктерінде o1 ең үздік нәтижеден де асып түсті. Көру арқылы қабылдау мүмкіндіктері қосылғанда, o1 MMMU бойынша 78.2% жинап, адам сарапшыларымен бәсекелесе алған алғашқы модель болды. Сондай-ақ ол MMLU-дың 57 ішкі санатының 54-інде GPT‑4o‑дан озып түсті.

Ой тізбегі

Адам қиын сұраққа жауап бермес бұрын ұзақ ойлануы мүмкін болғаны сияқты, o1 де мәселені шешуге тырысқанда ой тізбегін қолданады. Күшейтпелі оқыту арқылы o1 өз ой тізбегін ұштап, қолданатын стратегияларын жетілдіруді үйренеді. Ол қателерін тануды және түзетуді үйренеді. Ол күрделі қадамдарды қарапайымырақ бөліктерге бөлуді үйренеді. Қазіргі тәсіл жұмыс істемегенде, басқа жолды байқап көруді үйренеді. Бұл үдеріс модельдің ой қорыту қабілетін күрт жақсартады. Осы үлкен ілгерілеуді көрсету үшін төменде o1‑preview‑дің бірнеше қиын есептегі ой тізбегін ұсынамыз.

GPT-4o

Пайдаланушы

oyfjdnisdr rtqwainr acxz mynzbhhx -> Think step by step

Use the example above to decode:

oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

Let’s break this down step by step based on the example:

1. Example given:
• Input: oyfjdnisdr rtqwainr acxz mynzbhhx
• Output: Think step by step

By examining the words:
• The pattern involves selecting specific letters or transforming them.

2. Now, let’s decode the new phrase:
• Input: oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

Step-by-step decoding process:

Identify possible word divisions.

The first phrase might follow the same structure as the example. It seems like the input text could be broken into groups based on natural separations or patterns.

A possible division could be:

oyekaijzdf
aaptcg
suaokybhai
ouow
aqht
mynznvaatzacdfoulxxz

However, to make a meaningful decoding, I would need more context about the transformations or letter shifting that might be involved.

Could you provide any additional decoding rules or transformations used in this cipher?

OpenAI o1-preview

Пайдаланушы

oyfjdnisdr rtqwainr acxz mynzbhhx -> Think step by step

Use the example above to decode:

oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

5 секунд ойланды

First, what is going on here?

We are given:

First, an example:

"oyfjdnisdr rtqwainr acxz mynzbhhx -> Think step by step"

We can see that when "oyfjdnisdr rtqwainr acxz mynzbhhx" is transformed to "Think step by step"

Our task is: Use the example above to decode: "oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz"

So the first part is to figure out how "oyfjdnisdr rtqwainr acxz mynzbhhx" is decoded into "Think step by step"

Observation 1: The phrase "Think step by step" is 18 letters (excluding spaces) . Let's count the letters.

Similarly, let's count the letters in "oyfjdnisdr rtqwainr acxz mynzbhhx"

Original Ciphertext:

oyfjdnisdr rtqwainr acxz mynzbhhx

Plaintext:

Think step by step

First, break down the letters:

Ciphertext letters: o y f j d n i s d r r t q w a i n r a c x z m y n z b h h x

Total letters: Let's count the letters in the ciphertext.

First word: oyfjdnisdr (10 letters)

Second word: rtqwainr (8 letters)

Third word: acxz (4 letters)

Fourth word: mynzbhhx (8 letters)

Total letters in ciphertext: 10 + 8 + 4 + 8 = 30 letters

Plaintext: Think step by step

First word: Think (5 letters)

Second word: step (4 letters)

Third word: by (2 letters)

Fourth word: step (4 letters)

Total letters in plaintext: 5 + 4 + 2 + 4 = 15 letters

Hmm.

But actually in the problem it says the example:

[Ciphertext] -> Think step by step

So is the decoding process that the ciphertext maps to "Think step by step"?

Our goal is to use the example above to decode:

"oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz"

First, the cipher seems connected to the plaintext.

Given the time constraints, perhaps the easiest way is to try to see patterns.

Option 1: Try to find mapping from letters to letters.

Do any letters match?

First, let's write down the ciphertext and plaintext letters on top of each other.

Ciphertext: o y f j d n i s d r r t q w a i n r a c x z m y n z b h h x

Plaintext: T h i n k s t e p b y s t e p

Wait a minute.

I think maybe there is an anagram or substitution cipher here.

Alternatively, I think that we can notice that each group of ciphertext corresponds to a plaintext word.

Check the number of letters.

First ciphertext word: oyfjdnisdr (10 letters)

First plaintext word: Think (5 letters)

Hmm.

Second ciphertext word: rtqwainr (8 letters)

Second plaintext word: step (4 letters)

Third ciphertext word: acxz (4 letters)

Third plaintext word: by (2 letters)

Fourth ciphertext word: mynzbhhx (8 letters)

Fourth plaintext word: step (4 letters)

Interesting.

It seems that the ciphertext words are exactly twice as long as the plaintext words.

(10 vs 5, 8 vs 4, 4 vs 2, 8 vs 4)

Idea: Maybe we need to take every other letter or rebuild the plaintext from the ciphertext accordingly.

Let's test this theory.

If we take every other letter from the ciphertext words, maybe we'll get the plaintext.

Let's try this for the first word:

Ciphertext word: o y f j d n i s d r (10 letters)

Alternate letters:

First letters (positions 1,3,5,7,9): o f d i d

Second letters (positions 2,4,6,8,10): y j n s r

Wait: Let's try taking every second letter or first letter.

Option 1: Take the odd positions:

Positions: 1 3 5 7 9

Letters: o f d i d

Plaintext word is "Think", letters are T h i n k

Hmm.

Perhaps we can map letters:

o f d i d --> Think

Not directly

Option 2: Try mapping as per an assigned code: perhaps columns of letters?

Alternatively, perhaps the cipher is more complex.

Alternatively, notice that "oyfjdnisdr" has 10 letters and "Think" has 5 letters.

What if each plaintext letter is being represented by two ciphertext letters?

Similarly for other words.

Suppose that each plaintext letter corresponds to pair of letters in the ciphertext.

Meaning that the first two letters of ciphertext map to the first plaintext letter, and so on.

So for the first word, partition the ciphertext letters into pairs:

oy fj dn is dr

Corresponding to plaintext letters T h i n k

So we can attempt to map:

oy -> T

fj -> h

dn -> i

is -> n

dr -> k

If this is the mapping, perhaps each pair of letters maps to one plaintext letter.

Similarly for other words.

Let's test this mapping.

Is there a consistent mapping from pairs to letters?

Let's make a table of pairs and corresponding letters.

First, make the pairs from the first word and their corresponding plaintext letters:

Ciphertext Pairs -> Plaintext Letter

oy -> T

fj -> h

dn -> i

is -> n

dr -> k

Now, what's the mapping from "oy" to 'T'?

Is there a way to map 'oy' to 'T'?

Similarly for others.

One possible way is that 'oy' represents 'T'.

But how?

Option 1: Sum of the letters.

Convert letters to numbers (A=1, B=2, ..., Z=26)

So let's try to get to 'T' from 'o' and 'y'

o = 15

y = 25

Sum: 15 +25 = 40

But 'T' is 20.

Alternatively, perhaps subtract: 25 -15 = 10.

No.

Alternatively, perhaps combine the numbers in some way.

Alternatively, think about their positions in the alphabet.

Alternatively, perhaps the letters are encrypted via a code.

Alternatively, perhaps if we overlay the word 'Think' over the cipher pairs 'oy', 'fj', etc., the cipher is formed by substituting each plaintext letter with two letters.

Alternatively, perhaps consider the 'original' letters.

Let's look at the letters between the word "Think" and "oyfjdnisdr"

Perhaps we can spot that "oyfjdnisdr" is an anagram for "Thinkfiles"?

But that seems unlikely.

Alternatively, think of the pairs.

Suppose we assign numbers to letters: a=1, b=2, ..., z=26

Consider the numbers in 'oy':

o = 15

y = 25

Add them: 15 + 25 = 40

Now, for 'T' we have:

'T' is number 20

What if we take the sum of the cipher pair mod 26.

15 +25 = 40

Then 40 mod 26 = 14

14 corresponds to 'N'

Doesn't match.

Alternatively, see if the cipher pairs correspond to something else.

Alternatively, perhaps the pair 'oy' corresponds to 'T' via key.

Alternatively, perhaps it's as simple as 'oy' is 'T'.

But 'o' corresponds to 'T', subtracting 'o' ASCII code from 'T'.

Now, let's think in terms of symmetric letters.

Alternatively, perhaps we can spot that the letters correspond to the first letters of words.

But perhaps an easier way is to look at the word positions.

Alternatively, perhaps we can see that the ciphertext letters are offsets from the plaintext letters.

So let's try to map the pairs to the plaintext letters.

First pair 'oy' and plaintext letter 'T'

Letter 'o' is 15, 'T' is 20

Letter 'o' + 5 = 'T' (15 +5=20)

But 'y' is 25, which is 5 letters after 'T' (20 +5=25)

Interesting, seems that 'o' +5 = 'T'

Similarly, 'y' +1 = 'Z', which is not 'T'.

Alternatively, maybe the average of the two letters corresponds to 'T'

(15 +25)/2 =20, which is 'T'

Wait a minute, that seems promising.

First pair: 'o' (15) + 'y' (25) = 40

40 /2 =20

20 corresponds to 'T'

So perhaps the average of the letters in the pair corresponds to the plaintext letter.

Let's check this with the second pair.

Second pair: 'fj' corresponding to 'h'

Letters 'f'=6, 'j'=10

Sum: 6+10=16

Average:16/2=8

8 corresponds to 'h' (since 'h' is 8)

Success!

Third pair: 'dn' to 'i'

'd'=4, 'n'=14

Sum:4+14=18

Average:18/2=9

9 corresponds to 'i'(9='i')

But 'i' is 9, so that seems off by 1.

So perhaps we need to think carefully about letters.

Wait, 18/2=9, 9 corresponds to 'I'

So this works.

Fourth pair: 'is' corresponding to 'n'

'i'=9, 's'=19

Sum:9+19=28

Average:28/2=14

14 corresponds to 'n'(14='N')

Yes!

Fifth pair: 'dr' corresponds to 'k'

'd'=4, 'r'=18

Sum:4+18=22

Average:22/2=11

11 corresponds to 'k'(11='K')

Perfect!

So our code is: For each pair, sum their numeric values, divide by 2, get the corresponding letter.

Alternatively, the average of the numeric values of the letters corresponds to the plaintext letter.

This seems to be a valid method.

Now let's test this with the other words.

Second word ciphertext: rtqwainr

Ciphertext pairs:

rt -> s

qw -> ?

ai -> ?

nr -> ?

Corresponding to plaintext letters 's', 't', 'e', 'p'

So let's take the pairs:

1. 'rt' corresponds to 's'

r=18, t=20

Sum:38

Average:19

19 corresponds to 's' (19='S')

Good

2. 'qw' corresponds to 't'

q=17, w=23

Sum:17+23=40

Average:20

20 corresponds to 't' (20='T')

Good

3. 'ai' corresponds to 'e'

a=1, i=9

Sum:10

Average:5

5 corresponds to 'e' (5='E')

Good

4. 'nr' corresponds to 'p'

n=14, r=18

Sum:32

Average:16

16 corresponds to 'p' (16='P')

Perfect!

Third word: acxz (4 letters)

Plaintext word: 'by' (2 letters)

Ciphertext pairs:

a c

x z

Plaintext letters: 'b', 'y'

First pair: 'a', 'c'

a=1, c=3

Sum:4

Average:2

2 corresponds to 'b'(2='B')

Second pair: 'x' and 'z'

x=24, z=26

Sum:50

Average:25

25 corresponds to 'y'(25='Y')

Yes!

Fourth word: mynzbhhx (8 letters)

Plaintext word: 'step' (4 letters)

Ciphertext pairs:

my

nz

bh

hx

Plaintext letters 's','t','e','p'

First pair 'm','y'

m=13, y=25

Sum:38

Average:19

19 corresponds to 's'(19='S')

Second pair: 'n','z'

n=14, z=26

Sum: 40

Average:20

20='t'

Third pair: 'b','h'

b=2, h=8

Sum:10

Average:5

5='e'

Fourth pair: 'h','x'

h=8, x=24

Sum:32

Average:16

16='p'

Got it!

Now, given this, we can decode the given message.

Ciphertext to decode:

"oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz"

First, we need to divide into words.

Given that the ciphertext in the example had words of length:

-10 letters (5 pairs)
-8 letters (4 pairs)
-4 letters (2 pairs)
-8 letters (4 pairs)

Transported to plaintext words of:

-5 letters
-4 letters
-2 letters
-4 letters

So perhaps in our supplied ciphertext, we need to divide the letters into words accordingly.

Ciphertext: o y e k a i j z d f a a p t c g s u a o k y b h a i o u o w a q h t m y n z n v a a t z a c d f o u l x x z

First, let's write the letters in order:

1) oyekaijzdf (10 letters) -> word of 5 letters

2) aaptcg (6 letters) -> word of 3 letters

3) suaokybhai (10 letters) -> word of 5 letters

4) ouow (4 letters) -> word of 2 letters

5) aqht (4 letters) -> word of 2 letters

6) mynznvaatzacdfoulxxz (22 letters) -> word of 11 letters

Totals:

So the letters are:

First word: oyekaijzdf (10 letters = 5 pairs)

Second word: aaptcg (6 letters=3 pairs)

Third word: suaokybhai (10 letters=5 pairs)

Fourth word: ouow (4 letters=2 pairs)

Fifth word: aqht (4 letters=2 pairs)

Sixth word: mynznvaatzacdfoulxxz (22 letters=11 pairs)

So we can suppose that the plaintext words have lengths:

1) 5 letters

2) 3 letters

3) 5 letters

4) 2 letters

5) 2 letters

6)11 letters

Now we can proceed to decode.

First word: oyekaijzdf

Divide into pairs:

oy

ek

ai

jz

df

So we can find the plaintext letters by averaging the numeric values of the letters in each pair.

First pair: 'o','y'

o=15, y=25

Sum=40

Average=20

20='T'

Second pair: 'e','k'

e=5, k=11

Sum=16

Average=8

8='H'

Third pair: 'a','i'

a=1, i=9

Sum=10

Average=5

5='E'

Fourth pair: 'j','z'

j=10, z=26

Sum=36

Average=18

18='R'

Fifth pair: 'd','f'

d=4, f=6

Sum=10

Average=5

5='E'

So the first word is 'T','H','E','R','E'

So 'THERE'

Second word: aaptcg

Pairs:

aa

pt

cg

First pair: 'a','a'

a=1, a=1

Sum=2

Average=1

1='A'

Second pair: 'p','t'

p=16, t=20

Sum=36

Average=18

18='R'

Third pair: 'c','g'

c=3, g=7

Sum=10

Average=5

5='E'

So second word: 'A','R','E'

Third word: suaokybhai

Pairs:

su

ao

ky

bh

ai

First pair: s=19, u=21

Sum=40, Average=20

20='T'

Second pair: a=1, o=15

Sum=16, Average=8

8='H'

Third pair: k=11, y=25

Sum=36, Average=18

18='R'

Fourth pair: b=2, h=8

Sum=10, Average=5

5='E'

Fifth pair: a=1, i=9

Sum=10, Average=5

5='E'

Third word: 'T','H','R','E','E'

Fourth word: ouow

Pairs: ou, ow

First pair: o=15, u=21

Sum=36, Average=18

18='R'

Second pair: o=15, w=23

Sum=38, Average=19

19='S'

Fourth word: 'R','S'

Fifth word: aqht

Pairs: a q, h t

First pair: a=1, q=17

Sum=18, Average=9

9='I'

Second pair: h=8, t=20

Sum=28, Average=14

14='N'

Fifth word: 'I','N'

Sixth word: mynznvaatzacdfoulxxz (22 letters: 11 pairs)

Pairs:

m y

n z

n v

a a

t z

a c

d f

o u

l x

x z

Let's list them properly.

Wait, earlier I missed some letters there.

Let's re-express the sixth word letters:

m y n z n v a a t z a c d f o u l x x z

Total letters: 22 letters

Pairs:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z

Wait, that's 21 letters.

But in total, we have 22 letters; I think I have a mismatch.

Wait, in the problem the sixth word is:

mynznvaatzacdfoulxxz

Let's separate the letters:

m y n z n v a a t z a c d f o u l x x z

Total letters: 22 letters

So pairs would be:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z

Wait, 'z' is only one letter at the end.

So maybe the letters are:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z (left alone?)

But then we're missing a letter.

Alternatively, perhaps the 'z' at the end is an extra letter or it's paired with the previous 'x'.

Alternatively, perhaps there is a mistake.

Alternatively, perhaps when we reach the last 'x x z', we should write:

9. l x

10. x z

Total pairs: 11

So updated pairs:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x z

Now total pairs: 10 pairs. But we are supposed to have 11 letters in the plaintext word.

So perhaps I have miscounted and not considered including 'f' or 'z'.

Wait, is there any chance that the sequence is:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z

We have 22 letters.

Perhaps the 'x x' and 'z' are overlapping.

But we need 11 pairs.

I think for this to work, we need to create 11 pairs from the 22 letters.

So let's try pairing letters in order, taking two letters at a time.

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z ?

But again, last pair only has one letter 'z'

Alternatively, we might need to adjust the pairs.

Perhaps it's 'ou', 'lx', 'xz', and there is no extra 'x'.

Maybe I miswrote.

Wait, let's write the letters as they come:

m y n z n v a a t z a c d f o u l x x z

Positions:

1 m

2 y

3 n

4 z

5 n

6 v

7 a

8 a

9 t

10 z

11 a

12 c

13 d

14 f

15 o

16 u

17 l

18 x

19 x

20 z

So 20 letters.

Now that makes sense.

Now we can divide into pairs:

Pairs:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x z

Total of 10 pairs, which will correspond to 10 letters.

But earlier we thought the word was 11 letters.

But 10 letters may be acceptable.

Maybe we should proceed with the 10 pairs.

First pair: m y

m=13, y=25

Sum=38

Average=19

19='S'

Second pair: n z

n=14, z=26

Sum=40

Average=20

20='T'

Third pair: n v

n=14, v=22

Sum=36

Average=18

18='R'

Fourth pair: a a

a=1, a=1

Sum=2

Average=1

1='A'

Fifth pair: t z

t=20, z=26

Sum=46

Average=23

23='W'

Sixth pair: a c

a=1, c=3

Sum=4

Average=2

2='B'

Seventh pair: d f

d=4, f=6

Sum=10

Average=5

5='E'

Eighth pair: o u

o=15, u=21

Sum=36

Average=18

18='R'

Ninth pair: l x

l=12, x=24

Sum=36

Average=18

18='R'

Tenth pair: x z

x=24, z=26

Sum=50

Average=25

25='Y'

So the tenth word is 'S', 'T', 'R', 'A', 'W', 'B', 'E', 'R', 'R', 'Y'

Wait that's 10 letters

So the sixth word is 'STRAWBERRY', which makes sense.

So putting all these translated letters together:

First word: 'THERE'

Second word: 'ARE'

Third word: 'THREE'

Fourth word: 'RS' (? Doesn't make sense)

Wait, the fourth word, the pairs were:

ou ow

Pairs:

First pair: o u

o=15, u=21

Sum=36, Average=18

18='R'

Second pair: o w

o=15, w=23

Sum=38, Average=19

19='S'

So we get 'R','S'

Similarly, fifth word 'IN'

'R', 'S' 'I','N'

So the message so far is

'THERE ARE THREE RS IN'

And then 'STRAWBERRY'

Wait, let's put the words together:

'THERE ARE THREE Rs IN STRAWBERRY'

I think 'RS' is 'R's', meaning 'Rs'

So the message is:

'THERE ARE THREE R'S IN STRAWBERRY'

Which is true.

So the decoded message is:

"THERE ARE THREE R'S IN STRAWBERRY"

To decode the given ciphertext using the example provided, we’ll use the pattern identified in the example.

Ciphertext Example:

Қарапайым мәтін

1oyfjdnisdr rtqwainr acxz mynzbhhx

Decoded as:

Қарапайым мәтін

1Think step by step

Decoding Method:

Pair the letters in the ciphertext.
Convert each letter to its numerical position in the alphabet (A=1, B=2, …, Z=26).
Sum the numerical values of each pair.
Compute the average of the sum (divide by 2).
Convert the average back to a letter (1=A, 2=B, …, 26=Z).

Apply this method to the given ciphertext:

Қарапайым мәтін

1oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

Break down the ciphertext into pairs:

1. First word: “oyekaijzdf”
1. Pairs: oy, ek, ai, jz, df
2. Decoded letters:
  1. oy → (15+25)/2 = 20 → T
  2. ek → (5+11)/2 = 8 → H
  3. ai → (1+9)/2 = 5 → E
  4. jz → (10+26)/2 = 18 → R
3. df → (4+6)/2 = 5 → E
4. Decoded word: THERE
Second word: “aaptcg”
1. Pairs: aa, pt, cg
2. Decoded letters:
  1. aa → (1+1)/2 = 1 → A
  2. pt → (16+20)/2 = 18 → R
  3. cg → (3+7)/2 = 5 → E
  4. Decoded word: ARE
Third word: “suaokybhai”
1. Pairs: su, ao, ky, bh, ai
2. Decoded letters:
  1. su → (19+21)/2 = 20 → T
  2. ao → (1+15)/2 = 8 → H
  3. ky → (11+25)/2 = 18 → R
  4. bh → (2+8)/2 = 5 → E
  5. ai → (1+9)/2 = 5 → E
3. Decoded word: THREE
Fourth word: “ouow”
1. Pairs: ou, ow
2. Decoded letters:
  1. ou → (15+21)/2 = 18 → R
  2. ow → (15+23)/2 = 19 → S
3. Decoded letters: R’S
Fifth word: “aqht”
1. Pairs: aq, ht
2. Decoded letters:
  1. aq → (1+17)/2 = 9 → I
  2. ht → (8+20)/2 = 14 → N
3. Decoded word: IN
Sixth word: “mynznvaatzacdfoulxxz”
1. Pairs: my, nz, nv, aa, tz, ac, df, ou, lx, xz
2. Decoded letters:
  1. my → (13+25)/2 = 19 → S
  2. nz → (14+26)/2 = 20 → T
  3. nv → (14+22)/2 = 18 → R
  4. aa → (1+1)/2 = 1 → A
  5. tz → (20+26)/2 = 23 → W
  6. ac → (1+3)/2 = 2 → B
  7. df → (4+6)/2 = 5 → E
  8. ou → (15+21)/2 = 18 → R
  9. lx → (12+24)/2 = 18 → R
  10. xz → (24+26)/2 = 25 → Y
3. Decoded word: STRAWBERRY

Final Decoded Message:

Қарапайым мәтін

1THERE ARE THREE R'S IN STRAWBERRY

Answer:
THERE ARE THREE R’S IN STRAWBERRY

Код жазу

Бағдарламалау дағдыларын одан әрі жетілдіру үшін o1 негізінде бастапқы күйге келтіріп, қосымша жаттықтыру арқылы біз 2024 жылғы Халықаралық информатика олимпиадасында (IOI) 213 ұпай жинап, 49-процентильге енген модельді дайындадық. Бұл модель 2024 жылғы IOI жарысына адам қатысушылармен бірдей шарттарда қатысты. Оған алты күрделі алгоритмдік есепті шешуге он сағат берілді және әр есепке 50 жіберілім жасауға рұқсат етілді.

Әр есеп үшін біздің жүйе көптеген ықтимал жіберілімдерді іріктеп, тест кезіндегі таңдау стратегиясына сүйене отырып, олардың 50-сін жіберді. Жіберілімдер IOI-дың ашық тест жағдайларындағы нәтижеге, модель жасаған тест жағдайларына және үйретілген бағалау функциясына қарай таңдалды. Егер оның орнына біз кездейсоқ жібергенде, орта есеппен небәрі 156 ұпай ғана алар едік, бұл осы стратегияның жарыс шектеулері жағдайында шамамен 60 ұпайға татитынын көрсетеді.

Жіберілім шектеуі жеңілдетілгенде, модель өнімділігі айтарлықтай жақсарғанын байқадық. Әр есепке 10 000 жіберілім жасауға рұқсат етілгенде, модель ешқандай тест кезіндегі таңдау стратегиясынсыз-ақ 362.14 ұпайға жетті — бұл алтын медаль шегінен жоғары.

Соңында, осы модельдің код жазу шеберлігін көрсету үшін Codeforces өткізетін жарыстық бағдарламалау турнирлерін симуляцияладық. Біздің бағалаулар жарыс ережелеріне барынша жақын болды және 10 жіберілімге мүмкіндік берді. GPT‑4o 808 Elo³ рейтингіне жетті, бұл адам қатысушылардың 11-процентиліне сәйкес келеді. Бұл модель GPT‑4o мен o1‑ден әлдеқайда озық болды — ол 1807 Elo рейтингіне жетіп, қатысушылардың 93%-ынан жақсы нәтиже көрсетті.

The image shows a bar chart comparing Codeforces Elo percentile rankings for different models. GPT-4o has 808 Elo (11th percentile), o1 preview has 1258 Elo (62nd percentile), o1 has 1673 Elo (89th percentile), and o1-ioi has 1807 Elo (93rd percentile).

Further fine-tuning on programming competitions improves o1. The improved model ranked in the 49th percentile in the 2024 International Olympiad in Informatics under competition rules.

Адам қалауын бағалау

Емтихандар мен академиялық бенчмарктерден бөлек, біз сондай-ақ әртүрлі салалардағы күрделі, ашық форматтағы көмексөздерде o1‑preview пен GPT‑4o‑ға қатысты адамның қалауын бағаладық. Бұл бағалауда адам жаттықтырушыларына o1‑preview мен GPT‑4o берген анонимдендірілген жауаптар көрсетіліп, қай жауапты қалайтыны бойынша дауыс берілді. Деректер талдауы, код жазу және математика сияқты ой қорытуға көбірек сүйенетін санаттарда o1‑preview gpt-4o-дан едәуір артық таңдалды. Алайда кейбір табиғи тіл тапсырмаларында o1‑preview артық таңдалмайды, бұл оның барлық қолдану жағдайына бірдей сай келмейтінін көрсетеді.

Қауіпсіздік

Ой тізбегі арқылы ой қорыту туралау мен қауіпсіздік үшін жаңа мүмкіндіктер береді. Біз модельдің мінез-құлқына қатысты саясаттарымызды ойлайтын модельдің ой тізбегіне кіріктіру адам құндылықтары мен қағидаларын сенімді түрде үйретудің тиімді жолы екенін анықтадық. Модельге қауіпсіздік ережелерімізді және олар туралы контексте қалай ой қорыту керегін үйрету арқылы, ой қорыту қабілетінің модельдің орнықтылығына тікелей пайда әкелетініне дәлел таптық: o1‑preview jailbreak-ті бағалаудағы негізгі тексерулерде және модельдің қауіпсіз бас тарту шекараларын бағалайтын ең күрделі ішкі бенчмарктерімізде айтарлықтай жақсы нәтиже көрсетті. Біз ой тізбегін қолдану қауіпсіздік пен туралау үшін елеулі ілгерілеу береді деп санаймыз, өйткені (1) ол модельдің қалай ойлайтынын түсінікті түрде бақылауға мүмкіндік береді, және (2) модельдің қауіпсіздік ережелері туралы ой қорытуы таралымнан тыс сценарийлерге берігірек болады.

Жақсартуларымызды стресс-тексеру үшін біз енгізу алдында Дайындық шеңберіне⁠(жаңа терезеде ашылады) сәйкес қауіпсіздік сынақтары мен red-teaming жиынтығын өткіздік. Біз ой тізбегі арқылы ой қорыту бағалауларымыздың барлығында да қабілеттің жақсаруына ықпал еткенін анықтадық. Ерекше атап өтетіні, біз сыйақыны айламен пайдаланудың қызықты жағдайларын⁠(жаңа терезеде ашылады) байқадық. Осы бағалаулардың егжей-тегжейлі нәтижелерін қоса берілген жүйе картасынан табуға болады.

Метрика	GPT-4o	o1-preview
% Зиянды көмексөздерге қауіпсіз жауаптар стандартты	0,990	0,995
% Зиянды көмексөздер бойынша қауіпсіз толықтырулар Күрделі: джейлбрейктер мен шеткі жағдайлар	0,714	0,934
↳ Қорлау (ауыр)	0,845	0,900
↳ Қанаушылық сипаттағы жыныстық мазмұн	0,483	0,949
↳ Кәмелетке толмағандардың қатысуымен жыныстық контент	0,707	0,931
↳ Зорлық-зомбылықсыз құқық бұзушылық туралы кеңес	0,688	0,961
↳ Зорлық-зомбылыққа қатысты құқықбұзушылық туралы кеңес	0,778	0,963
% WildChat-та санат бойынша модерация API ұпайлары ең жоғары топ 200 үшін қауіпсіз толықтырулар Zhao, et al. 2024	0,945	0,971
Жақсылық@0.1 StrongREJECT джейлбрейк бағалау Souly et al. 2024	0,220	0,840
Адамдардан алынған джейлбрейк бағасы	0,770	0,960
% Ішкі зиянсыз шеткі жағдайлар бойынша сәйкестік «шамадан тыс бас тарту емес»	0,910	0,930
% XSTest жүйесіндегі зиянсыз шеткі жағдайлар бойынша сәйкестік«шамадан тыс бас тартпау» Röttger, et al. 2023	0,924	0,976

Ой тізбектерін жасыру

Біз жасырын ой тізбегі модельдерді бақылауға бірегей мүмкіндік береді деп санаймыз. Егер ол шынайы әрі түсінікті болса, жасырын ой тізбегі бізге модельдің "ойын оқуға" мүмкіндік береді және оның ойлау үдерісін түсінуге көмектеседі. Мысалы, болашақта біз ой тізбегін пайдаланушыны манипуляциялау белгілеріне қарап бақылағымыз келуі мүмкін. Алайда мұның жұмыс істеуі үшін модельдің өз ойын өзгеріссіз түрде еркін жеткізе алуы қажет, сондықтан біз ой тізбегіне саясатқа сәйкестікті де, пайдаланушы қалауларын да үйрете алмаймыз. Сондай-ақ тураланбаған ой тізбегін пайдаланушыларға тікелей көрсеткіміз келмейді.

Сондықтан пайдаланушы тәжірибесі, бәсекелестік артықшылық және ой тізбегін бақылауды дамыту мүмкіндігі сияқты бірнеше факторды таразылай келе, біз пайдаланушыларға шикі ой тізбектерін көрсетпеу туралы шешім қабылдадық. Бұл шешімнің кемшіліктері бар екенін мойындаймыз. Оның орнын ішінара толтыру үшін біз модельге ой тізбегіндегі кез келген пайдалы идеяны жауапта қайта беруді үйретуге тырысамыз. o1 модельдер сериясы үшін біз ой тізбегінің модель жасаған қысқаша мазмұнын көрсетеміз.

Қорытынды

o1 AI-дағы ой қорыту бойынша ең үздік нәтижені айтарлықтай алға жылжытады. Біз бұл модельді жетілдіруді жалғастыра отырып, оның жақсартылған нұсқаларын шығаруды жоспарлап отырмыз. Бұл жаңа ой қорыту мүмкіндіктері модельдерді адам құндылықтары мен қағидаларына туралау қабілетімізді жақсартады деп күтеміз. o1 — және оның мұрагерлері — ғылымда, код жазуда, математикада және соған жақын салаларда AI үшін көптеген жаңа қолдану жағдайларын ашады деп сенеміз. Пайдаланушылар мен API әзірлеушілері оның күнделікті жұмысын қалай жақсарта алатынын ашатынына қуаныштымыз.

A қосымшасы

Деректер жиынтығы	Метрика	gpt-4o	o1-preview	o1
Олимпиадалық математика AIME (2024)	cons@64	13,4	56,7	83,3
Олимпиадалық математика AIME (2024)	pass@1	9,3	44,6	74,4
Байқау коды CodeForces	Эло	808	1 258	1 673
Байқау коды CodeForces	Пайыздық көрсеткіш	11,0	62,0	89,0
GPQA Diamond	cons@64	56,1	78,3	78,0
GPQA Diamond	pass@1	50,6	73,3	77,3
Биология	cons@64	63,2	73,7	68,4
Биология	pass@1	61,6	65,9	69,2
Химия	cons@64	43,0	60,2	65,6
Химия	pass@1	40,2	59,9	64,7
Физика	cons@64	68,6	89,5	94,2
Физика	pass@1	59,5	89,4	92,8
Математика	pass@1	60,3	85,5	94,8
MMLU	pass@1	88,0	92,3	90,8
MMMU (val)	pass@1	69,1	n/a	78,2
MathVista (testmini)	pass@1	63,8	n/a	73,9

Авторлар

OpenAI

Үлес қосушыларды көру

Дәйексөздер

1
https://www.anthropic.com/news/claude-3-5-sonnet⁠(жаңа терезеде ашылады), https://deepmind.google/technologies/gemini/pro⁠(жаңа терезеде ашылады)
2
Біздің бағалаулар https://arxiv.org/abs/2305.20050⁠(жаңа терезеде ашылады) еңбегінде қолданылған дәл сол 500 есептік тест бөлінісін пайдаланды
3
https://codeforces.com/blog/entry/68288⁠(жаңа терезеде ашылады)