12 სექტემბერი, 2024

Learning to reason with LLMs

იტვირთება…

OpenAI o1 კონკურენტული პროგრამირების კითხვებში (Codeforces) 89-ე პერცენტილშია, აშშ-ის მათემატიკური ოლიმპიადის საკვალიფიკაციო გამოცდაზე (AIME) აშშ-ში საუკეთესო 500 მოსწავლეს შორის ხვდება და ფიზიკის, ბიოლოგიისა და ქიმიის ამოცანების ბენჩმარკზე (GPQA) ადამიანური PhD-დონის სიზუსტეს აჭარბებს. მიუხედავად იმისა, რომ ამ ახალი მოდელის მიმდინარე მოდელებივით მარტივად გამოსაყენებლად საჭირო სამუშაო ჯერ კიდევ მიმდინარეობს, ამ მოდელის ადრეულ ვერსიას, OpenAI o1‑preview‑ს, დაუყოვნებელი გამოყენებისთვის ვუშვებთ ChatGPT‑ში და სანდო API მომხმარებლებისთვის⁠(იხსნება ახალ ფანჯარაში).

ჩვენი მასშტაბური განმამტკიცებელი სწავლების ალგორითმი მოდელს ასწავლის, როგორ იფიქროს პროდუქტიულად თავისი აზროვნების ჯაჭვის გამოყენებით, მეტად მონაცემეფექტიან სასწავლო პროცესში. აღმოვაჩინეთ, რომ o1‑ის შედეგიანობა თანმიმდევრულად უმჯობესდება როგორც მეტი განმამტკიცებელი სწავლებით (ტრენინგის დროის compute), ისე ფიქრში დახარჯული მეტი დროით (ტესტირების დროის compute). ამ მიდგომის მასშტაბირების შეზღუდვები მნიშვნელოვნად განსხვავდება LLM წინასწარი გაწვრთნის შეზღუდვებისგან და მათ კვლევას ვაგრძელებთ.

The image shows two scatter plots comparing "o1 AIME accuracy" during training and at test time. Both charts have "pass@1 accuracy" on the y-axis and compute (log scale) on the x-axis. The dots indicate increasing accuracy with more compute time.

o1 performance smoothly improves with both train-time and test-time compute

შეფასებები

GPT‑4o‑სთან შედარებით მსჯელობის გაუმჯობესების წარმოსაჩენად, ჩვენი მოდელები ადამიანების მრავალფეროვან გამოცდებსა და ML ბენჩმარკებზე გამოვცადეთ. ვაჩვენებთ, რომ o1 ამ მსჯელობაზე ორიენტირებული ამოცანების აბსოლუტურ უმრავლესობაში GPT‑4o‑ს მნიშვნელოვნად აჭარბებს. თუ სხვა რამ მითითებული არ არის, o1 შევაფასეთ ტესტირების დროის გამოთვლითი რესურსის მაქსიმალურ პარამეტრზე.

საკონკურსო მათემატიკის შეფასება (AIME 2024) — o1 მნიშვნელოვნად აუმჯობესებს შედეგებს GPT-4o-სთან შედარებით რთულ ლოგიკურ ბენჩმარკებში. მთლიანი სვეტები ასახავს pass@1 სიზუსტეს, ხოლო დაშტრიხული (დაჩრდილული) არე აჩვენებს უმრავლესობის კენჭისყრის (კონსენსუსის) შედეგებს 64 ნიმუშის გამოყენებით.

კონკურსის შეფასებები მათემატიკისთვის (AIME 2024), კოდისთვის (CodeForces) და PhD-Level სამეცნიერო კითხვებისთვის (GPQA Diamond) — o1 მნიშვნელოვნად აუმჯობესებს შედეგებს GPT-4o-სთან შედარებით რთულ ლოგიკურ ბენჩმარკებში. მთლიანი სვეტები ასახავს pass@1 სიზუსტეს, ხოლო დაშტრიხული (დაჩრდილული) არე აჩვენებს უმრავლესობის კენჭისყრის (კონსენსუსის) შედეგებს 64 ნიმუშის გამოყენებით.

GPT-4o-სა და o1-ის სიზუსტისა და ნედლი ქულების დეტალური შედარება სხვადასხვა საკონკურსო შეფასებებში — o1 აუმჯობესებს GPT-4o-ს შედეგებს ბენჩმარკების ფართო სპექტრში, მათ შორის MMLU-ის 54/57 ქვეკატეგორიაში. შვიდი ნაჩვენებია ილუსტრაციისთვის.

ML ბენჩმარკები და გამოცდების ქულები gpt4o-სა და o1-ის გაუმჯობესებისთვის, მობილურის ალტერნატიული ტექსტი — o1 აუმჯობესებს GPT-4o-ს შედეგებს ბენჩმარკების ფართო სპექტრში, მათ შორის MMLU-ის 54/57 ქვეკატეგორიაში. შვიდი ნაჩვენებია ილუსტრაციისთვის.

მსჯელობაზე ორიენტირებულ მრავალ ბენჩმარკში o1 ადამიან ექსპერტების შედეგებს უტოლდება. ბოლო დროის მოწინავე მოდელები¹ MATH²-სა და GSM8K-ზე იმდენად კარგ შედეგებს აჩვენებენ, რომ ეს ბენჩმარკები მოდელების გასარჩევად უკვე ეფექტიანი აღარ არის. მათემატიკური შედეგიანობა შევაფასეთ AIME-ზე — გამოცდაზე, რომელიც ამერიკაში ყველაზე ძლიერ მოსწავლე მათემატიკოსებსაც კი სერიოზულ გამოწვევას უქმნის. 2024 წლის AIME გამოცდებზე GPT‑4o‑მ ამოცანების საშუალოდ მხოლოდ 12% (1.8/15) ამოხსნა. o1‑ის საშუალო შედეგი იყო 74% (11.1/15) თითო ამოცანაზე ერთი ნიმუშით, 83% (12.5/15) 64 ნიმუშის კონსენსუსით და 93% (13.9/15), როდესაც 1000 ნიმუშის ხელახალი რანჟირება ნასწავლი შეფასების ფუნქციით მოხდა. 13.9 ქულა მას ეროვნულ დონეზე საუკეთესო 500 მოსწავლეს შორის ათავსებს და USA Mathematical Olympiad-ის გამსვლელ ზღვარსაც აღემატება.

ჩვენ ასევე შევაფასეთ o1 GPQA diamond-ზე — რთულ ინტელექტის ბენჩმარკზე, რომელიც ქიმიაში, ფიზიკასა და ბიოლოგიაში ექსპერტიზას ამოწმებს. მოდელების ადამიანებთან შესადარებლად, GPQA-diamond-ის კითხვებზე პასუხის გასაცემად PhD ხარისხის ექსპერტები მოვიწვიეთ. აღმოვაჩინეთ, რომ o1‑მა ამ ადამიან ექსპერტებს აჯობა და ამ ბენჩმარკზე ეს შეძლო პირველმა მოდელმა. ეს შედეგები არ ნიშნავს, რომ o1 ყველა მხრივ PhD-ზე უფრო ქმედუნარიანია — მხოლოდ იმას, რომ მოდელი უკეთ ართმევს თავს ზოგიერთი ისეთი ამოცანის ამოხსნას, რომლის ამოხსნაც PhD-სგან მოსალოდნელია. რამდენიმე სხვა ML ბენჩმარკზეც o1‑მა state-of-the-art შედეგი გააუმჯობესა. ჩართული ხედვითი აღქმის შესაძლებლობებით o1‑მა MMMU-ზე 78.2% მიიღო, რითაც გახდა პირველი მოდელი, რომელიც ადამიან ექსპერტებთან კონკურენტუნარიანია. მან GPT‑4o‑ს ასევე აჯობა MMLU-ის 57 ქვექატეგორიიდან 54-ში.

აზროვნების ჯაჭვი

ისევე როგორც ადამიანს რთულ კითხვაზე პასუხამდე შესაძლოა დიდი ხნით ფიქრი დასჭირდეს, პრობლემის გადაჭრის მცდელობისას o1 აზროვნების ჯაჭვს იყენებს. განმამტკიცებელი სწავლების მეშვეობით o1 სწავლობს, როგორ დახვეწოს თავისი აზროვნების ჯაჭვი და გააუმჯობესოს გამოყენებული სტრატეგიები. ის სწავლობს საკუთარი შეცდომების ამოცნობასა და გასწორებას. სწავლობს რთული ნაბიჯების უფრო მარტივებად დაშლას. სწავლობს სხვა მიდგომის ცდას, როცა მიმდინარე გზა არ მუშაობს. ეს პროცესი მოდელის მსჯელობის უნარს მკვეთრად აუმჯობესებს. ამ დიდი წინსვლის საილუსტრაციოდ ქვემოთ წარმოვადგენთ o1‑preview‑ის აზროვნების ჯაჭვს რამდენიმე რთულ ამოცანაზე.

GPT-4o

მომხმარებელი

oyfjdnisdr rtqwainr acxz mynzbhhx -> Think step by step

Use the example above to decode:

oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

Let’s break this down step by step based on the example:

1. Example given:
• Input: oyfjdnisdr rtqwainr acxz mynzbhhx
• Output: Think step by step

By examining the words:
• The pattern involves selecting specific letters or transforming them.

2. Now, let’s decode the new phrase:
• Input: oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

Step-by-step decoding process:

Identify possible word divisions.

The first phrase might follow the same structure as the example. It seems like the input text could be broken into groups based on natural separations or patterns.

A possible division could be:

oyekaijzdf
aaptcg
suaokybhai
ouow
aqht
mynznvaatzacdfoulxxz

However, to make a meaningful decoding, I would need more context about the transformations or letter shifting that might be involved.

Could you provide any additional decoding rules or transformations used in this cipher?

OpenAI o1-preview

მომხმარებელი

oyfjdnisdr rtqwainr acxz mynzbhhx -> Think step by step

Use the example above to decode:

oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

იფიქრა 5 წამი

First, what is going on here?

We are given:

First, an example:

"oyfjdnisdr rtqwainr acxz mynzbhhx -> Think step by step"

We can see that when "oyfjdnisdr rtqwainr acxz mynzbhhx" is transformed to "Think step by step"

Our task is: Use the example above to decode: "oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz"

So the first part is to figure out how "oyfjdnisdr rtqwainr acxz mynzbhhx" is decoded into "Think step by step"

Observation 1: The phrase "Think step by step" is 18 letters (excluding spaces) . Let's count the letters.

Similarly, let's count the letters in "oyfjdnisdr rtqwainr acxz mynzbhhx"

Original Ciphertext:

oyfjdnisdr rtqwainr acxz mynzbhhx

Plaintext:

Think step by step

First, break down the letters:

Ciphertext letters: o y f j d n i s d r r t q w a i n r a c x z m y n z b h h x

Total letters: Let's count the letters in the ciphertext.

First word: oyfjdnisdr (10 letters)

Second word: rtqwainr (8 letters)

Third word: acxz (4 letters)

Fourth word: mynzbhhx (8 letters)

Total letters in ciphertext: 10 + 8 + 4 + 8 = 30 letters

Plaintext: Think step by step

First word: Think (5 letters)

Second word: step (4 letters)

Third word: by (2 letters)

Fourth word: step (4 letters)

Total letters in plaintext: 5 + 4 + 2 + 4 = 15 letters

Hmm.

But actually in the problem it says the example:

[Ciphertext] -> Think step by step

So is the decoding process that the ciphertext maps to "Think step by step"?

Our goal is to use the example above to decode:

"oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz"

First, the cipher seems connected to the plaintext.

Given the time constraints, perhaps the easiest way is to try to see patterns.

Option 1: Try to find mapping from letters to letters.

Do any letters match?

First, let's write down the ciphertext and plaintext letters on top of each other.

Ciphertext: o y f j d n i s d r r t q w a i n r a c x z m y n z b h h x

Plaintext: T h i n k s t e p b y s t e p

Wait a minute.

I think maybe there is an anagram or substitution cipher here.

Alternatively, I think that we can notice that each group of ciphertext corresponds to a plaintext word.

Check the number of letters.

First ciphertext word: oyfjdnisdr (10 letters)

First plaintext word: Think (5 letters)

Hmm.

Second ciphertext word: rtqwainr (8 letters)

Second plaintext word: step (4 letters)

Third ciphertext word: acxz (4 letters)

Third plaintext word: by (2 letters)

Fourth ciphertext word: mynzbhhx (8 letters)

Fourth plaintext word: step (4 letters)

Interesting.

It seems that the ciphertext words are exactly twice as long as the plaintext words.

(10 vs 5, 8 vs 4, 4 vs 2, 8 vs 4)

Idea: Maybe we need to take every other letter or rebuild the plaintext from the ciphertext accordingly.

Let's test this theory.

If we take every other letter from the ciphertext words, maybe we'll get the plaintext.

Let's try this for the first word:

Ciphertext word: o y f j d n i s d r (10 letters)

Alternate letters:

First letters (positions 1,3,5,7,9): o f d i d

Second letters (positions 2,4,6,8,10): y j n s r

Wait: Let's try taking every second letter or first letter.

Option 1: Take the odd positions:

Positions: 1 3 5 7 9

Letters: o f d i d

Plaintext word is "Think", letters are T h i n k

Hmm.

Perhaps we can map letters:

o f d i d --> Think

Not directly

Option 2: Try mapping as per an assigned code: perhaps columns of letters?

Alternatively, perhaps the cipher is more complex.

Alternatively, notice that "oyfjdnisdr" has 10 letters and "Think" has 5 letters.

What if each plaintext letter is being represented by two ciphertext letters?

Similarly for other words.

Suppose that each plaintext letter corresponds to pair of letters in the ciphertext.

Meaning that the first two letters of ciphertext map to the first plaintext letter, and so on.

So for the first word, partition the ciphertext letters into pairs:

oy fj dn is dr

Corresponding to plaintext letters T h i n k

So we can attempt to map:

oy -> T

fj -> h

dn -> i

is -> n

dr -> k

If this is the mapping, perhaps each pair of letters maps to one plaintext letter.

Similarly for other words.

Let's test this mapping.

Is there a consistent mapping from pairs to letters?

Let's make a table of pairs and corresponding letters.

First, make the pairs from the first word and their corresponding plaintext letters:

Ciphertext Pairs -> Plaintext Letter

oy -> T

fj -> h

dn -> i

is -> n

dr -> k

Now, what's the mapping from "oy" to 'T'?

Is there a way to map 'oy' to 'T'?

Similarly for others.

One possible way is that 'oy' represents 'T'.

But how?

Option 1: Sum of the letters.

Convert letters to numbers (A=1, B=2, ..., Z=26)

So let's try to get to 'T' from 'o' and 'y'

o = 15

y = 25

Sum: 15 +25 = 40

But 'T' is 20.

Alternatively, perhaps subtract: 25 -15 = 10.

No.

Alternatively, perhaps combine the numbers in some way.

Alternatively, think about their positions in the alphabet.

Alternatively, perhaps the letters are encrypted via a code.

Alternatively, perhaps if we overlay the word 'Think' over the cipher pairs 'oy', 'fj', etc., the cipher is formed by substituting each plaintext letter with two letters.

Alternatively, perhaps consider the 'original' letters.

Let's look at the letters between the word "Think" and "oyfjdnisdr"

Perhaps we can spot that "oyfjdnisdr" is an anagram for "Thinkfiles"?

But that seems unlikely.

Alternatively, think of the pairs.

Suppose we assign numbers to letters: a=1, b=2, ..., z=26

Consider the numbers in 'oy':

o = 15

y = 25

Add them: 15 + 25 = 40

Now, for 'T' we have:

'T' is number 20

What if we take the sum of the cipher pair mod 26.

15 +25 = 40

Then 40 mod 26 = 14

14 corresponds to 'N'

Doesn't match.

Alternatively, see if the cipher pairs correspond to something else.

Alternatively, perhaps the pair 'oy' corresponds to 'T' via key.

Alternatively, perhaps it's as simple as 'oy' is 'T'.

But 'o' corresponds to 'T', subtracting 'o' ASCII code from 'T'.

Now, let's think in terms of symmetric letters.

Alternatively, perhaps we can spot that the letters correspond to the first letters of words.

But perhaps an easier way is to look at the word positions.

Alternatively, perhaps we can see that the ciphertext letters are offsets from the plaintext letters.

So let's try to map the pairs to the plaintext letters.

First pair 'oy' and plaintext letter 'T'

Letter 'o' is 15, 'T' is 20

Letter 'o' + 5 = 'T' (15 +5=20)

But 'y' is 25, which is 5 letters after 'T' (20 +5=25)

Interesting, seems that 'o' +5 = 'T'

Similarly, 'y' +1 = 'Z', which is not 'T'.

Alternatively, maybe the average of the two letters corresponds to 'T'

(15 +25)/2 =20, which is 'T'

Wait a minute, that seems promising.

First pair: 'o' (15) + 'y' (25) = 40

40 /2 =20

20 corresponds to 'T'

So perhaps the average of the letters in the pair corresponds to the plaintext letter.

Let's check this with the second pair.

Second pair: 'fj' corresponding to 'h'

Letters 'f'=6, 'j'=10

Sum: 6+10=16

Average:16/2=8

8 corresponds to 'h' (since 'h' is 8)

Success!

Third pair: 'dn' to 'i'

'd'=4, 'n'=14

Sum:4+14=18

Average:18/2=9

9 corresponds to 'i'(9='i')

But 'i' is 9, so that seems off by 1.

So perhaps we need to think carefully about letters.

Wait, 18/2=9, 9 corresponds to 'I'

So this works.

Fourth pair: 'is' corresponding to 'n'

'i'=9, 's'=19

Sum:9+19=28

Average:28/2=14

14 corresponds to 'n'(14='N')

Yes!

Fifth pair: 'dr' corresponds to 'k'

'd'=4, 'r'=18

Sum:4+18=22

Average:22/2=11

11 corresponds to 'k'(11='K')

Perfect!

So our code is: For each pair, sum their numeric values, divide by 2, get the corresponding letter.

Alternatively, the average of the numeric values of the letters corresponds to the plaintext letter.

This seems to be a valid method.

Now let's test this with the other words.

Second word ciphertext: rtqwainr

Ciphertext pairs:

rt -> s

qw -> ?

ai -> ?

nr -> ?

Corresponding to plaintext letters 's', 't', 'e', 'p'

So let's take the pairs:

1. 'rt' corresponds to 's'

r=18, t=20

Sum:38

Average:19

19 corresponds to 's' (19='S')

Good

2. 'qw' corresponds to 't'

q=17, w=23

Sum:17+23=40

Average:20

20 corresponds to 't' (20='T')

Good

3. 'ai' corresponds to 'e'

a=1, i=9

Sum:10

Average:5

5 corresponds to 'e' (5='E')

Good

4. 'nr' corresponds to 'p'

n=14, r=18

Sum:32

Average:16

16 corresponds to 'p' (16='P')

Perfect!

Third word: acxz (4 letters)

Plaintext word: 'by' (2 letters)

Ciphertext pairs:

a c

x z

Plaintext letters: 'b', 'y'

First pair: 'a', 'c'

a=1, c=3

Sum:4

Average:2

2 corresponds to 'b'(2='B')

Second pair: 'x' and 'z'

x=24, z=26

Sum:50

Average:25

25 corresponds to 'y'(25='Y')

Yes!

Fourth word: mynzbhhx (8 letters)

Plaintext word: 'step' (4 letters)

Ciphertext pairs:

my

nz

bh

hx

Plaintext letters 's','t','e','p'

First pair 'm','y'

m=13, y=25

Sum:38

Average:19

19 corresponds to 's'(19='S')

Second pair: 'n','z'

n=14, z=26

Sum: 40

Average:20

20='t'

Third pair: 'b','h'

b=2, h=8

Sum:10

Average:5

5='e'

Fourth pair: 'h','x'

h=8, x=24

Sum:32

Average:16

16='p'

Got it!

Now, given this, we can decode the given message.

Ciphertext to decode:

"oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz"

First, we need to divide into words.

Given that the ciphertext in the example had words of length:

-10 letters (5 pairs)
-8 letters (4 pairs)
-4 letters (2 pairs)
-8 letters (4 pairs)

Transported to plaintext words of:

-5 letters
-4 letters
-2 letters
-4 letters

So perhaps in our supplied ciphertext, we need to divide the letters into words accordingly.

Ciphertext: o y e k a i j z d f a a p t c g s u a o k y b h a i o u o w a q h t m y n z n v a a t z a c d f o u l x x z

First, let's write the letters in order:

1) oyekaijzdf (10 letters) -> word of 5 letters

2) aaptcg (6 letters) -> word of 3 letters

3) suaokybhai (10 letters) -> word of 5 letters

4) ouow (4 letters) -> word of 2 letters

5) aqht (4 letters) -> word of 2 letters

6) mynznvaatzacdfoulxxz (22 letters) -> word of 11 letters

Totals:

So the letters are:

First word: oyekaijzdf (10 letters = 5 pairs)

Second word: aaptcg (6 letters=3 pairs)

Third word: suaokybhai (10 letters=5 pairs)

Fourth word: ouow (4 letters=2 pairs)

Fifth word: aqht (4 letters=2 pairs)

Sixth word: mynznvaatzacdfoulxxz (22 letters=11 pairs)

So we can suppose that the plaintext words have lengths:

1) 5 letters

2) 3 letters

3) 5 letters

4) 2 letters

5) 2 letters

6)11 letters

Now we can proceed to decode.

First word: oyekaijzdf

Divide into pairs:

oy

ek

ai

jz

df

So we can find the plaintext letters by averaging the numeric values of the letters in each pair.

First pair: 'o','y'

o=15, y=25

Sum=40

Average=20

20='T'

Second pair: 'e','k'

e=5, k=11

Sum=16

Average=8

8='H'

Third pair: 'a','i'

a=1, i=9

Sum=10

Average=5

5='E'

Fourth pair: 'j','z'

j=10, z=26

Sum=36

Average=18

18='R'

Fifth pair: 'd','f'

d=4, f=6

Sum=10

Average=5

5='E'

So the first word is 'T','H','E','R','E'

So 'THERE'

Second word: aaptcg

Pairs:

aa

pt

cg

First pair: 'a','a'

a=1, a=1

Sum=2

Average=1

1='A'

Second pair: 'p','t'

p=16, t=20

Sum=36

Average=18

18='R'

Third pair: 'c','g'

c=3, g=7

Sum=10

Average=5

5='E'

So second word: 'A','R','E'

Third word: suaokybhai

Pairs:

su

ao

ky

bh

ai

First pair: s=19, u=21

Sum=40, Average=20

20='T'

Second pair: a=1, o=15

Sum=16, Average=8

8='H'

Third pair: k=11, y=25

Sum=36, Average=18

18='R'

Fourth pair: b=2, h=8

Sum=10, Average=5

5='E'

Fifth pair: a=1, i=9

Sum=10, Average=5

5='E'

Third word: 'T','H','R','E','E'

Fourth word: ouow

Pairs: ou, ow

First pair: o=15, u=21

Sum=36, Average=18

18='R'

Second pair: o=15, w=23

Sum=38, Average=19

19='S'

Fourth word: 'R','S'

Fifth word: aqht

Pairs: a q, h t

First pair: a=1, q=17

Sum=18, Average=9

9='I'

Second pair: h=8, t=20

Sum=28, Average=14

14='N'

Fifth word: 'I','N'

Sixth word: mynznvaatzacdfoulxxz (22 letters: 11 pairs)

Pairs:

m y

n z

n v

a a

t z

a c

d f

o u

l x

x z

Let's list them properly.

Wait, earlier I missed some letters there.

Let's re-express the sixth word letters:

m y n z n v a a t z a c d f o u l x x z

Total letters: 22 letters

Pairs:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z

Wait, that's 21 letters.

But in total, we have 22 letters; I think I have a mismatch.

Wait, in the problem the sixth word is:

mynznvaatzacdfoulxxz

Let's separate the letters:

m y n z n v a a t z a c d f o u l x x z

Total letters: 22 letters

So pairs would be:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z

Wait, 'z' is only one letter at the end.

So maybe the letters are:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z (left alone?)

But then we're missing a letter.

Alternatively, perhaps the 'z' at the end is an extra letter or it's paired with the previous 'x'.

Alternatively, perhaps there is a mistake.

Alternatively, perhaps when we reach the last 'x x z', we should write:

9. l x

10. x z

Total pairs: 11

So updated pairs:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x z

Now total pairs: 10 pairs. But we are supposed to have 11 letters in the plaintext word.

So perhaps I have miscounted and not considered including 'f' or 'z'.

Wait, is there any chance that the sequence is:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z

We have 22 letters.

Perhaps the 'x x' and 'z' are overlapping.

But we need 11 pairs.

I think for this to work, we need to create 11 pairs from the 22 letters.

So let's try pairing letters in order, taking two letters at a time.

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z ?

But again, last pair only has one letter 'z'

Alternatively, we might need to adjust the pairs.

Perhaps it's 'ou', 'lx', 'xz', and there is no extra 'x'.

Maybe I miswrote.

Wait, let's write the letters as they come:

m y n z n v a a t z a c d f o u l x x z

Positions:

1 m

2 y

3 n

4 z

5 n

6 v

7 a

8 a

9 t

10 z

11 a

12 c

13 d

14 f

15 o

16 u

17 l

18 x

19 x

20 z

So 20 letters.

Now that makes sense.

Now we can divide into pairs:

Pairs:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x z

Total of 10 pairs, which will correspond to 10 letters.

But earlier we thought the word was 11 letters.

But 10 letters may be acceptable.

Maybe we should proceed with the 10 pairs.

First pair: m y

m=13, y=25

Sum=38

Average=19

19='S'

Second pair: n z

n=14, z=26

Sum=40

Average=20

20='T'

Third pair: n v

n=14, v=22

Sum=36

Average=18

18='R'

Fourth pair: a a

a=1, a=1

Sum=2

Average=1

1='A'

Fifth pair: t z

t=20, z=26

Sum=46

Average=23

23='W'

Sixth pair: a c

a=1, c=3

Sum=4

Average=2

2='B'

Seventh pair: d f

d=4, f=6

Sum=10

Average=5

5='E'

Eighth pair: o u

o=15, u=21

Sum=36

Average=18

18='R'

Ninth pair: l x

l=12, x=24

Sum=36

Average=18

18='R'

Tenth pair: x z

x=24, z=26

Sum=50

Average=25

25='Y'

So the tenth word is 'S', 'T', 'R', 'A', 'W', 'B', 'E', 'R', 'R', 'Y'

Wait that's 10 letters

So the sixth word is 'STRAWBERRY', which makes sense.

So putting all these translated letters together:

First word: 'THERE'

Second word: 'ARE'

Third word: 'THREE'

Fourth word: 'RS' (? Doesn't make sense)

Wait, the fourth word, the pairs were:

ou ow

Pairs:

First pair: o u

o=15, u=21

Sum=36, Average=18

18='R'

Second pair: o w

o=15, w=23

Sum=38, Average=19

19='S'

So we get 'R','S'

Similarly, fifth word 'IN'

'R', 'S' 'I','N'

So the message so far is

'THERE ARE THREE RS IN'

And then 'STRAWBERRY'

Wait, let's put the words together:

'THERE ARE THREE Rs IN STRAWBERRY'

I think 'RS' is 'R's', meaning 'Rs'

So the message is:

'THERE ARE THREE R'S IN STRAWBERRY'

Which is true.

So the decoded message is:

"THERE ARE THREE R'S IN STRAWBERRY"

To decode the given ciphertext using the example provided, we’ll use the pattern identified in the example.

Ciphertext Example:

უბრალო ტექსტი

1oyfjdnisdr rtqwainr acxz mynzbhhx

Decoded as:

უბრალო ტექსტი

1Think step by step

Decoding Method:

Pair the letters in the ciphertext.
Convert each letter to its numerical position in the alphabet (A=1, B=2, …, Z=26).
Sum the numerical values of each pair.
Compute the average of the sum (divide by 2).
Convert the average back to a letter (1=A, 2=B, …, 26=Z).

Apply this method to the given ciphertext:

უბრალო ტექსტი

1oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

Break down the ciphertext into pairs:

1. First word: “oyekaijzdf”
1. Pairs: oy, ek, ai, jz, df
2. Decoded letters:
  1. oy → (15+25)/2 = 20 → T
  2. ek → (5+11)/2 = 8 → H
  3. ai → (1+9)/2 = 5 → E
  4. jz → (10+26)/2 = 18 → R
3. df → (4+6)/2 = 5 → E
4. Decoded word: THERE
Second word: “aaptcg”
1. Pairs: aa, pt, cg
2. Decoded letters:
  1. aa → (1+1)/2 = 1 → A
  2. pt → (16+20)/2 = 18 → R
  3. cg → (3+7)/2 = 5 → E
  4. Decoded word: ARE
Third word: “suaokybhai”
1. Pairs: su, ao, ky, bh, ai
2. Decoded letters:
  1. su → (19+21)/2 = 20 → T
  2. ao → (1+15)/2 = 8 → H
  3. ky → (11+25)/2 = 18 → R
  4. bh → (2+8)/2 = 5 → E
  5. ai → (1+9)/2 = 5 → E
3. Decoded word: THREE
Fourth word: “ouow”
1. Pairs: ou, ow
2. Decoded letters:
  1. ou → (15+21)/2 = 18 → R
  2. ow → (15+23)/2 = 19 → S
3. Decoded letters: R’S
Fifth word: “aqht”
1. Pairs: aq, ht
2. Decoded letters:
  1. aq → (1+17)/2 = 9 → I
  2. ht → (8+20)/2 = 14 → N
3. Decoded word: IN
Sixth word: “mynznvaatzacdfoulxxz”
1. Pairs: my, nz, nv, aa, tz, ac, df, ou, lx, xz
2. Decoded letters:
  1. my → (13+25)/2 = 19 → S
  2. nz → (14+26)/2 = 20 → T
  3. nv → (14+22)/2 = 18 → R
  4. aa → (1+1)/2 = 1 → A
  5. tz → (20+26)/2 = 23 → W
  6. ac → (1+3)/2 = 2 → B
  7. df → (4+6)/2 = 5 → E
  8. ou → (15+21)/2 = 18 → R
  9. lx → (12+24)/2 = 18 → R
  10. xz → (24+26)/2 = 25 → Y
3. Decoded word: STRAWBERRY

Final Decoded Message:

უბრალო ტექსტი

1THERE ARE THREE R'S IN STRAWBERRY

Answer:
THERE ARE THREE R’S IN STRAWBERRY

კოდირება

ჩვენ გავწვრთენით მოდელი, რომელმაც 2024 წლის ინფორმატიკის საერთაშორისო ოლიმპიადაზე (IOI) 213 ქულა დააგროვა და 49-ე პერცენტილში მოხვდა. ამისთვის საწყისად o1 გამოვიყენეთ და ტრენინგი პროგრამირების უნარების შემდგომ გასაუმჯობესებლად გავაგრძელეთ. ეს მოდელი 2024 წლის IOI-ში იმავე პირობებით ასპარეზობდა, როგორითაც ადამიანმა მონაწილეებმა. მას ექვსი რთული ალგორითმული ამოცანის გადასაჭრელად ათი საათი ჰქონდა და თითო ამოცანაზე 50 წარდგენა შეეძლო.

თითოეული ამოცანისთვის ჩვენმა სისტემამ ბევრი კანდიდატი წარდგენა დააგენერირა და ტესტირების დროის შერჩევის სტრატეგიის საფუძველზე მათგან 50 გაგზავნა. წარდგენები შეირჩა IOI-ის საჯარო სატესტო შემთხვევებზე, მოდელის მიერ გენერირებულ სატესტო შემთხვევებზე და ნასწავლი ქულების ფუნქციაზე ნაჩვენები შედეგების მიხედვით. მათ რომ შემთხვევითად ვაგზავნიდეთ, საშუალოდ მხოლოდ 156 ქულას ავიღებდით, რაც მიანიშნებს, რომ ეს სტრატეგია შეჯიბრის შეზღუდვების პირობებში თითქმის 60 ქულას იძლეოდა.

როცა წარდგენების შეზღუდვა შევამსუბუქეთ, დავინახეთ, რომ მოდელის შედეგიანობა მნიშვნელოვნად გაუმჯობესდა. როდესაც თითო ამოცანაზე 10,000 წარდგენა იყო ნებადართული, მოდელმა 362.14 ქულა მოიპოვა — ოქროს მედლის ზღვარზე მაღლა — ტესტირების დროის შერჩევის რაიმე სტრატეგიის გარეშეც კი.

ბოლოს, ამ მოდელის კოდირების უნარის საჩვენებლად Codeforces-ზე ჩატარებული კონკურენტული პროგრამირების კონკურსები დავასიმულირეთ. ჩვენი შეფასებები მაქსიმალურად შეესაბამებოდა კონკურსის წესებს და 10 წარდგენას უშვებდა. GPT‑4o‑მ მიიღო Elo რეიტინგი³ 808, რაც ადამიან მონაწილეებს შორის 11-ე პერცენტილია. ამ მოდელმა მნიშვნელოვნად გადააჭარბა როგორც GPT‑4o‑ს, ისე o1‑ს — მან 1807 Elo რეიტინგი მიიღო და კონკურენტთა 93%-ზე უკეთ იასპარეზა.

The image shows a bar chart comparing Codeforces Elo percentile rankings for different models. GPT-4o has 808 Elo (11th percentile), o1 preview has 1258 Elo (62nd percentile), o1 has 1673 Elo (89th percentile), and o1-ioi has 1807 Elo (93rd percentile).

Further fine-tuning on programming competitions improves o1. The improved model ranked in the 49th percentile in the 2024 International Olympiad in Informatics under competition rules.

ადამიანთა უპირატესობის შეფასება

გამოცდებისა და აკადემიური ბენჩმარკების გარდა, ჩვენ ასევე შევაფასეთ ადამიანების მიერ o1‑preview‑ისა და GPT‑4o‑ს უპირატესობა რთულ, ღია ტიპის მოთხოვნებზე მრავალფეროვან სფეროებში. ამ შეფასებაში ადამიან ტრენერებს უჩვენებდნენ ანონიმიზებულ პასუხებს ერთსა და იმავე მოთხოვნაზე o1‑preview‑დან და GPT‑4o‑დან და ისინი აძლევდნენ ხმას, რომელი პასუხი ურჩევნოდათ. მონაცემთა ანალიზის, კოდირებისა და მათემატიკის მსგავს მსჯელობაზე ორიენტირებულ კატეგორიებში o1‑preview‑ს gpt-4o-სთან შედარებით დიდი უპირატესობა ენიჭება. თუმცა ზოგიერთ ბუნებრივი ენის ამოცანაში o1‑preview‑ს უპირატესობას არ ანიჭებენ, რაც მიანიშნებს, რომ ის ყველა გამოყენების შემთხვევისთვის თანაბრად შესაფერისი არ არის.

უსაფრთხოება

აზროვნების ჯაჭვზე დაფუძნებული მსჯელობა გასწორებისა და უსაფრთხოებისთვის ახალ შესაძლებლობებს ქმნის. აღმოვაჩინეთ, რომ მოდელის ქცევის ჩვენი პოლიტიკების მსჯელობის მოდელის აზროვნების ჯაჭვში ინტეგრირება ადამიანის ღირებულებებისა და პრინციპების სანდოდ სწავლების ეფექტიანი გზაა. როდესაც მოდელს ვასწავლით ჩვენს უსაფრთხოების წესებს და იმასაც, როგორ იმსჯელოს მათზე კონტექსტში, ვხედავთ მტკიცებულებას, რომ მსჯელობის უნარი პირდაპირ აძლიერებს მოდელის მდგრადობას: o1‑preview‑მა მნიშვნელოვნად გააუმჯობესა შედეგები jailbreak-ის ძირითად შეფასებებში და ჩვენს ყველაზე რთულ შიდა ბენჩმარკებში, რომლებიც მოდელის უსაფრთხოებაზე უარის თქმის საზღვრებს აფასებს. გვჯერა, რომ აზროვნების ჯაჭვის გამოყენება უსაფრთხოებისა და გასწორების მხრივ მნიშვნელოვან წინსვლას გვაძლევს, რადგან (1) გვაძლევს შესაძლებლობას მოდელის აზროვნება მკაფიოდ დავაკვირდეთ და (2) მოდელის მსჯელობა უსაფრთხოების წესებზე უფრო გამძლეა განაწილების მიღმა სცენარების მიმართ.

ჩვენი გაუმჯობესებების გამძლეობის შესამოწმებლად, გაშვებამდე ჩავატარეთ უსაფრთხოების ტესტებისა და red-teaming-ის კომპლექსური სერია, ჩვენი მზაობის ჩარჩოს⁠(იხსნება ახალ ფანჯარაში) შესაბამისად. აღმოვაჩინეთ, რომ აზროვნების ჯაჭვზე დაფუძნებულმა მსჯელობამ ჩვენს შეფასებებში შესაძლებლობების გაუმჯობესებას შეუწყო ხელი. განსაკუთრებით აღსანიშნავია, რომ დავაფიქსირეთ reward hacking-ის საინტერესო შემთხვევები⁠(იხსნება ახალ ფანჯარაში). ამ შეფასებების დეტალური შედეგები შეგიძლიათ იხილოთ თანდართულ სისტემურ ბარათში.

მეტრიკა	GPT-4o	o1-preview
% უსაფრთხო დასრულებები მავნე მოთხოვნებზე სტანდარტული	0,990	0,995
% უსაფრთხო დასრულებები მავნე მოთხოვნებზე გამომწვევი: ჯეილბრეიკები და ზღვრული შემთხვევები	0,714	0,934
↳ შევიწროება (მძიმე)	0,845	0,900
↳ ექსპლუატაციური სექსუალური კონტენტი	0,483	0,949
↳ სექსუალური კონტენტი არასრულწლოვნების მონაწილეობით	0,707	0,931
↳ რჩევა არაძალადობრივი დანაშაულის შესახებ	0,688	0,961
↳ რჩევა ძალადობრივი დანაშაულის ჩადენის შესახებ	0,778	0,963
უსაფრთხო პასუხების პროცენტული მაჩვენებელი იმ 200 საუკეთესო მოთხოვნისთვის, რომლებსაც აქვთ Moderation API-ის ყველაზე მაღალი ქულები თითოეულ კატეგორიაში WildChat-ის მონაცემთა ბაზის მიხედვით. Zhao და სხ. 2024	0,945	0,971
Goodness@0.1 StrongREJECT jailbreak eval Souly და სხვ. 2024	0,220	0,840
ადამიანის მიერ მოპოვებული ჯეილბრეიკების შეფასება	0,770	0,960
შიდა უვნებელ სასაზღვრო შემთხვევების შესაბამისობის %ი „გაუმართლებელი უარის პრევენცია“	0,910	0,930
% შესაბამისობა უვნებელ ზღვრულ შემთხვევებში XSTest „ ზედმეტი უარის გარეშე“ Röttger და სხ. 2023	0,924	0,976

აზროვნების ჯაჭვების დამალვა

გვჯერა, რომ დამალული აზროვნების ჯაჭვი მოდელების მონიტორინგისთვის უნიკალურ შესაძლებლობას ქმნის. თუ ვივარაუდებთ, რომ ის სანდო და გასაგებია, დამალული აზროვნების ჯაჭვი საშუალებას გვაძლევს მოდელის „გონება წავიკითხოთ“ და მისი აზროვნების პროცესი გავიგოთ. მაგალითად, მომავალში შეიძლება გვინდოდეს აზროვნების ჯაჭვის მონიტორინგი მომხმარებლის მანიპულირების ნიშნების გამოსავლენად. თუმცა ამისთვის მოდელს უნდა ჰქონდეს თავისუფლება, საკუთარი აზრები უცვლელი ფორმით გამოხატოს, ამიტომ აზროვნების ჯაჭვზე ვერ დავატრენინგებთ ვერც პოლიტიკის შესაბამისობას და ვერც მომხმარებლის პრეფერენციებს. ასევე არ გვინდა, რომ გაუსწორებელი აზროვნების ჯაჭვი პირდაპირ ხილული გავხადოთ მომხმარებლებისთვის.

ამიტომ, მომხმარებლის გამოცდილების, კონკურენტული უპირატესობის და აზროვნების ჯაჭვის მონიტორინგის გზის გაგრძელების შესაძლებლობის ჩათვლით არაერთი ფაქტორის აწონ-დაწონის შემდეგ, გადავწყვიტეთ, მომხმარებლებს აზროვნების ნედლი ჯაჭვები არ ვაჩვენოთ. ვაღიარებთ, რომ ამ გადაწყვეტილებას უარყოფითი მხარეებიც აქვს. ნაწილობრივ ამის საკომპენსაციოდ ვცდილობთ მოდელს ვასწავლოთ, რომ პასუხში აზროვნების ჯაჭვიდან ნებისმიერი სასარგებლო იდეა აღადგინოს. o1 მოდელების სერიისთვის ვაჩვენებთ მოდელის მიერ გენერირებულ აზროვნების ჯაჭვის შეჯამებას.

დასკვნა

o1 მნიშვნელოვნად აუმჯობესებს AI მსჯელობაში არსებულ საუკეთესო შედეგებს. ამ მოდელის გაუმჯობესებული ვერსიების გამოშვებას ვგეგმავთ, რადგან განვითარებას ვაგრძელებთ. ველით, რომ მსჯელობის ეს ახალი შესაძლებლობები გააუმჯობესებს ჩვენს უნარს, მოდელები ადამიანურ ღირებულებებსა და პრინციპებს მოვარგოთ. გვჯერა, რომ o1 — და მისი მომდევნო ვერსიები — AI-სთვის მეცნიერებაში, კოდირებაში, მათემატიკასა და მონათესავე სფეროებში ბევრ ახალ გამოყენების შემთხვევას გახსნის. მოხარულები ვართ, რომ მომხმარებლები და API დეველოპერები აღმოაჩენენ, როგორ შეუძლია მას მათი ყოველდღიური სამუშაოს გაუმჯობესება.

დანართი A

მონაცემთა ნაკრები	მეტრიკა	gpt-4o	o1-preview	o1
საკონკურსო მათემატიკა AIME (2024)	cons@64	13,4	56,7	83,3
საკონკურსო მათემატიკა AIME (2024)	pass@1	9,3	44,6	74,4
კონკურსის კოდი CodeForces	Elo	808	1258	1673
კონკურსის კოდი CodeForces	პერცენტილი	11,0	62,0	89,0
GPQA Diamond	cons@64	56,1	78,3	78,0
GPQA Diamond	pass@1	50,6	73,3	77,3
ბიოლოგია	cons@64	63,2	73,7	68,4
ბიოლოგია	pass@1	61,6	65,9	69,2
ქიმია	cons@64	43,0	60,2	65,6
ქიმია	pass@1	40,2	59,9	64,7
ფიზიკა	cons@64	68,6	89,5	94,2
ფიზიკა	pass@1	59,5	89,4	92,8
მათემატიკა	pass@1	60,3	85,5	94,8
MMLU	pass@1	88,0	92,3	90,8
MMMU (val)	pass@1	69,1	არ მიესადაგება	78,2
MathVista (testmini)	pass@1	63,8	არ მიესადაგება	73,9

ავტორები

OpenAI

კონტრიბუტორების ნახვა

ციტირებები

1
https://www.anthropic.com/news/claude-3-5-sonnet⁠(იხსნება ახალ ფანჯარაში), https://deepmind.google/technologies/gemini/pro⁠(იხსნება ახალ ფანჯარაში)
2
ჩვენს შეფასებებში გამოყენებული იყო იგივე 500-ამოცანიანი სატესტო გაყოფა, რაც მოცემულია https://arxiv.org/abs/2305.20050⁠(იხსნება ახალ ფანჯარაში)
3
https://codeforces.com/blog/entry/68288⁠(იხსნება ახალ ფანჯარაში)