12 Meán Fómhair 2024

Learning to reason with LLMs

Ag lódáil…

Tá OpenAI o1 sa 89ú peircintíl ar cheisteanna ríomhchlárúcháin iomaíoch (Codeforces), tá sé i measc na 500 mac léinn is fearr sna Stáit Aontaithe i gcáilitheoir d’Oilimpiad Matamaitice SAM (AIME), agus sáraíonn sé cruinneas daonna ag leibhéal PhD ar thagarmharc fadhbanna san fhisic, sa bhitheolaíocht agus sa cheimic (GPQA). Cé go bhfuil an obair atá de dhíth chun an tsamhail nua seo a dhéanamh chomh héasca le húsáid leis na samhlacha reatha fós ar bun, táimid ag scaoileadh leagan luath den tsamhail seo, OpenAI o1‑preview, le húsáid láithreach i ChatGPT agus ag úsáideoirí API iontaofa⁠(osclaíonn i bhfuinneog nua).

Múineann ár n-algartam foghlama atreisiúcháin ar mhórscála don tsamhail conas smaoineamh go táirgiúil ag baint úsáide as a slabhra smaointeoireachta i bpróiseas oiliúna atá an-éifeachtúil ó thaobh sonraí de. Fuaireamar amach go bhfeabhsaíonn feidhmíocht o1 go comhsheasmhach le níos mó foghlama atreisiúcháin (ríomha ag am oiliúna) agus le níos mó ama caite ag smaoineamh (ríomha ag am tástála). Tá na srianta ar an gcur chuige seo a scálú an-difriúil ó na srianta ar réamhoiliúint LLM, agus táimid fós á n-imscrúdú.

The image shows two scatter plots comparing "o1 AIME accuracy" during training and at test time. Both charts have "pass@1 accuracy" on the y-axis and compute (log scale) on the x-axis. The dots indicate increasing accuracy with more compute time.

o1 performance smoothly improves with both train-time and test-time compute

Measúnuithe

Chun an feabhas ar an réasúnaíocht i gcomparáid le GPT‑4o a aibhsiú, rinneamar ár samhlacha a thástáil ar shraith éagsúil scrúduithe daonna agus tagarmharcanna ML. Léirímid go sáraíonn o1 GPT‑4o go suntasach i bhformhór mór na dtascanna seo atá trom ar an réasúnaíocht. Mura sonraítear a mhalairt, rinneamar measúnú ar o1 leis an socrú uasta ríomha ag am tástála.

Measúnú matamaitice iomaíochta (AIME 2024) — Feabhsaíonn o1 go mór i gcomparáid le GPT-4o ar thagarmharcanna dúshlánacha réasúnaíochta. Taispeánann barraí soladacha cruinneas pass@1 agus taispeánann an réigiún scáthaithe feidhmíocht an vóta tromlaigh (comhdhearcadh) le 64 sampla.

Meastóireachtaí comórtais do Mhatamaitic (AIME 2024), Cód (CodeForces), agus Ceisteanna Eolaíochta ar Leibhéal PhD (GPQA Diamant) — Feabhsaíonn o1 go mór i gcomparáid le GPT-4o ar thagarmharcanna dúshlánacha réasúnaíochta. Taispeánann barraí soladacha cruinneas pass@1 agus taispeánann an réigiún scáthaithe feidhmíocht an vóta tromlaigh (comhdhearcadh) le 64 sampla.

Miondealú ar chruinneas agus ar an scór amh de gpt-4o in aghaidh o1 ar mheasúnuithe iomaíochta éagsúla — Feabhsaíonn o1 ar GPT-4o thar raon leathan de thagarmharcanna, lena n-áirítear 54/57 fochatagóir MMLU. Taispeántar seacht gcinn le haghaidh léirithe.

Tagarmharcanna ML agus scóir scrúduithe do gpt4o i gcoinne o1, feabhsú ar barr, alt text do shoghluaiste — Feabhsaíonn o1 ar GPT-4o thar raon leathan de thagarmharcanna, lena n-áirítear 54/57 fochatagóir MMLU. Taispeántar seacht gcinn le haghaidh léirithe.

I go leor tagarmharcanna atá trom ar an réasúnaíocht, tá feidhmíocht o1 ar chomhchéim le feidhmíocht shaineolaithe daonna. Déanann samhlacha teorainn le déanaí¹ chomh maith sin ar MATH² agus GSM8K nach bhfuil na tagarmharcanna seo éifeachtach a thuilleadh chun idirdhealú a dhéanamh idir samhlacha. Rinneamar measúnú ar fheidhmíocht matamaitice ar AIME, scrúdú atá deartha chun na scoláirí meánscoile is gile i Meiriceá sa mhatamaitic a thabhairt faoi dhúshlán. I scrúduithe AIME 2024, níor réitigh GPT‑4o ach 12% (1.8/15) de na fadhbanna ar an meán. Bhain o1 74% (11.1/15) amach ar an meán le sampla amháin in aghaidh na faidhbe, 83% (12.5/15) le comhdhearcadh i measc 64 sampla, agus 93% (13.9/15) nuair a rinneadh 1000 sampla a athrangú le feidhm scórála fhoghlamtha. Cuireann scór 13.9 é i measc na 500 mac léinn is fearr go náisiúnta agus os cionn an scoithphointe d’Oilimpiad Mhatamaiticiúil SAM.

Rinneamar measúnú freisin ar o1 ar GPQA diamond, tagarmharc faisnéise deacair a dhéanann tástáil ar shaineolas sa cheimic, san fhisic agus sa bhitheolaíocht. Chun samhlacha a chur i gcomparáid le daoine, d’earcaíomar saineolaithe le PhDanna chun ceisteanna GPQA-diamond a fhreagairt. Fuaireamar amach gur sháraigh o1 feidhmíocht na saineolaithe daonna sin, agus gurbh í an chéad tsamhail í chun é sin a dhéanamh ar an tagarmharc seo. Ní thugann na torthaí seo le fios go bhfuil o1 níos cumasaí ná duine le PhD i ngach slí — ní léiríonn siad ach go bhfuil an tsamhail níos oilte ar fhadhbanna áirithe a réiteach a mbeifí ag súil le duine le PhD iad a réiteach. Ar roinnt tagarmharcanna ML eile, chuir o1 feabhas ar an gcaighdeán is airde. Agus a cumais aireachtála físe cumasaithe, bhain o1 78.2% amach ar MMMU, rud a fhágann gurb í an chéad tsamhail í atá in ann dul in iomaíocht le saineolaithe daonna. Sháraigh sí GPT‑4o freisin i 54 as 57 bhfochatagóir MMLU.

Slabhra Smaointeoireachta

Ar an gcaoi chéanna agus a d’fhéadfadh duine smaoineamh ar feadh i bhfad sula bhfreagraíonn sé ceist dheacair, úsáideann o1 slabhra smaointeoireachta nuair a bhíonn sé ag iarraidh fadhb a réiteach. Trí fhoghlaim atreisiúcháin, foghlaimíonn o1 a shlabhra smaointeoireachta a ghéarú agus na straitéisí a úsáideann sé a bheachtú. Foghlaimíonn sé a chuid botún a aithint agus a cheartú. Foghlaimíonn sé céimeanna casta a bhriseadh síos ina gceimeanna níos simplí. Foghlaimíonn sé cur chuige eile a thriail nuair nach bhfuil an ceann reatha ag obair. Cuireann an próiseas seo go mór le cumas na samhla réasúnú a dhéanamh. Chun an léim chun cinn seo a léiriú, taispeánaimid thíos an slabhra smaointeoireachta ó o1‑preview ar roinnt fadhbanna deacra.

GPT-4o

Úsáideoir

oyfjdnisdr rtqwainr acxz mynzbhhx -> Think step by step

Use the example above to decode:

oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

Let’s break this down step by step based on the example:

1. Example given:
• Input: oyfjdnisdr rtqwainr acxz mynzbhhx
• Output: Think step by step

By examining the words:
• The pattern involves selecting specific letters or transforming them.

2. Now, let’s decode the new phrase:
• Input: oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

Step-by-step decoding process:

Identify possible word divisions.

The first phrase might follow the same structure as the example. It seems like the input text could be broken into groups based on natural separations or patterns.

A possible division could be:

oyekaijzdf
aaptcg
suaokybhai
ouow
aqht
mynznvaatzacdfoulxxz

However, to make a meaningful decoding, I would need more context about the transformations or letter shifting that might be involved.

Could you provide any additional decoding rules or transformations used in this cipher?

OpenAI o1-preview

Úsáideoir

oyfjdnisdr rtqwainr acxz mynzbhhx -> Think step by step

Use the example above to decode:

oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

Smaoinigh ar feadh 5 soicind

First, what is going on here?

We are given:

First, an example:

"oyfjdnisdr rtqwainr acxz mynzbhhx -> Think step by step"

We can see that when "oyfjdnisdr rtqwainr acxz mynzbhhx" is transformed to "Think step by step"

Our task is: Use the example above to decode: "oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz"

So the first part is to figure out how "oyfjdnisdr rtqwainr acxz mynzbhhx" is decoded into "Think step by step"

Observation 1: The phrase "Think step by step" is 18 letters (excluding spaces) . Let's count the letters.

Similarly, let's count the letters in "oyfjdnisdr rtqwainr acxz mynzbhhx"

Original Ciphertext:

oyfjdnisdr rtqwainr acxz mynzbhhx

Plaintext:

Think step by step

First, break down the letters:

Ciphertext letters: o y f j d n i s d r r t q w a i n r a c x z m y n z b h h x

Total letters: Let's count the letters in the ciphertext.

First word: oyfjdnisdr (10 letters)

Second word: rtqwainr (8 letters)

Third word: acxz (4 letters)

Fourth word: mynzbhhx (8 letters)

Total letters in ciphertext: 10 + 8 + 4 + 8 = 30 letters

Plaintext: Think step by step

First word: Think (5 letters)

Second word: step (4 letters)

Third word: by (2 letters)

Fourth word: step (4 letters)

Total letters in plaintext: 5 + 4 + 2 + 4 = 15 letters

Hmm.

But actually in the problem it says the example:

[Ciphertext] -> Think step by step

So is the decoding process that the ciphertext maps to "Think step by step"?

Our goal is to use the example above to decode:

"oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz"

First, the cipher seems connected to the plaintext.

Given the time constraints, perhaps the easiest way is to try to see patterns.

Option 1: Try to find mapping from letters to letters.

Do any letters match?

First, let's write down the ciphertext and plaintext letters on top of each other.

Ciphertext: o y f j d n i s d r r t q w a i n r a c x z m y n z b h h x

Plaintext: T h i n k s t e p b y s t e p

Wait a minute.

I think maybe there is an anagram or substitution cipher here.

Alternatively, I think that we can notice that each group of ciphertext corresponds to a plaintext word.

Check the number of letters.

First ciphertext word: oyfjdnisdr (10 letters)

First plaintext word: Think (5 letters)

Hmm.

Second ciphertext word: rtqwainr (8 letters)

Second plaintext word: step (4 letters)

Third ciphertext word: acxz (4 letters)

Third plaintext word: by (2 letters)

Fourth ciphertext word: mynzbhhx (8 letters)

Fourth plaintext word: step (4 letters)

Interesting.

It seems that the ciphertext words are exactly twice as long as the plaintext words.

(10 vs 5, 8 vs 4, 4 vs 2, 8 vs 4)

Idea: Maybe we need to take every other letter or rebuild the plaintext from the ciphertext accordingly.

Let's test this theory.

If we take every other letter from the ciphertext words, maybe we'll get the plaintext.

Let's try this for the first word:

Ciphertext word: o y f j d n i s d r (10 letters)

Alternate letters:

First letters (positions 1,3,5,7,9): o f d i d

Second letters (positions 2,4,6,8,10): y j n s r

Wait: Let's try taking every second letter or first letter.

Option 1: Take the odd positions:

Positions: 1 3 5 7 9

Letters: o f d i d

Plaintext word is "Think", letters are T h i n k

Hmm.

Perhaps we can map letters:

o f d i d --> Think

Not directly

Option 2: Try mapping as per an assigned code: perhaps columns of letters?

Alternatively, perhaps the cipher is more complex.

Alternatively, notice that "oyfjdnisdr" has 10 letters and "Think" has 5 letters.

What if each plaintext letter is being represented by two ciphertext letters?

Similarly for other words.

Suppose that each plaintext letter corresponds to pair of letters in the ciphertext.

Meaning that the first two letters of ciphertext map to the first plaintext letter, and so on.

So for the first word, partition the ciphertext letters into pairs:

oy fj dn is dr

Corresponding to plaintext letters T h i n k

So we can attempt to map:

oy -> T

fj -> h

dn -> i

is -> n

dr -> k

If this is the mapping, perhaps each pair of letters maps to one plaintext letter.

Similarly for other words.

Let's test this mapping.

Is there a consistent mapping from pairs to letters?

Let's make a table of pairs and corresponding letters.

First, make the pairs from the first word and their corresponding plaintext letters:

Ciphertext Pairs -> Plaintext Letter

oy -> T

fj -> h

dn -> i

is -> n

dr -> k

Now, what's the mapping from "oy" to 'T'?

Is there a way to map 'oy' to 'T'?

Similarly for others.

One possible way is that 'oy' represents 'T'.

But how?

Option 1: Sum of the letters.

Convert letters to numbers (A=1, B=2, ..., Z=26)

So let's try to get to 'T' from 'o' and 'y'

o = 15

y = 25

Sum: 15 +25 = 40

But 'T' is 20.

Alternatively, perhaps subtract: 25 -15 = 10.

No.

Alternatively, perhaps combine the numbers in some way.

Alternatively, think about their positions in the alphabet.

Alternatively, perhaps the letters are encrypted via a code.

Alternatively, perhaps if we overlay the word 'Think' over the cipher pairs 'oy', 'fj', etc., the cipher is formed by substituting each plaintext letter with two letters.

Alternatively, perhaps consider the 'original' letters.

Let's look at the letters between the word "Think" and "oyfjdnisdr"

Perhaps we can spot that "oyfjdnisdr" is an anagram for "Thinkfiles"?

But that seems unlikely.

Alternatively, think of the pairs.

Suppose we assign numbers to letters: a=1, b=2, ..., z=26

Consider the numbers in 'oy':

o = 15

y = 25

Add them: 15 + 25 = 40

Now, for 'T' we have:

'T' is number 20

What if we take the sum of the cipher pair mod 26.

15 +25 = 40

Then 40 mod 26 = 14

14 corresponds to 'N'

Doesn't match.

Alternatively, see if the cipher pairs correspond to something else.

Alternatively, perhaps the pair 'oy' corresponds to 'T' via key.

Alternatively, perhaps it's as simple as 'oy' is 'T'.

But 'o' corresponds to 'T', subtracting 'o' ASCII code from 'T'.

Now, let's think in terms of symmetric letters.

Alternatively, perhaps we can spot that the letters correspond to the first letters of words.

But perhaps an easier way is to look at the word positions.

Alternatively, perhaps we can see that the ciphertext letters are offsets from the plaintext letters.

So let's try to map the pairs to the plaintext letters.

First pair 'oy' and plaintext letter 'T'

Letter 'o' is 15, 'T' is 20

Letter 'o' + 5 = 'T' (15 +5=20)

But 'y' is 25, which is 5 letters after 'T' (20 +5=25)

Interesting, seems that 'o' +5 = 'T'

Similarly, 'y' +1 = 'Z', which is not 'T'.

Alternatively, maybe the average of the two letters corresponds to 'T'

(15 +25)/2 =20, which is 'T'

Wait a minute, that seems promising.

First pair: 'o' (15) + 'y' (25) = 40

40 /2 =20

20 corresponds to 'T'

So perhaps the average of the letters in the pair corresponds to the plaintext letter.

Let's check this with the second pair.

Second pair: 'fj' corresponding to 'h'

Letters 'f'=6, 'j'=10

Sum: 6+10=16

Average:16/2=8

8 corresponds to 'h' (since 'h' is 8)

Success!

Third pair: 'dn' to 'i'

'd'=4, 'n'=14

Sum:4+14=18

Average:18/2=9

9 corresponds to 'i'(9='i')

But 'i' is 9, so that seems off by 1.

So perhaps we need to think carefully about letters.

Wait, 18/2=9, 9 corresponds to 'I'

So this works.

Fourth pair: 'is' corresponding to 'n'

'i'=9, 's'=19

Sum:9+19=28

Average:28/2=14

14 corresponds to 'n'(14='N')

Yes!

Fifth pair: 'dr' corresponds to 'k'

'd'=4, 'r'=18

Sum:4+18=22

Average:22/2=11

11 corresponds to 'k'(11='K')

Perfect!

So our code is: For each pair, sum their numeric values, divide by 2, get the corresponding letter.

Alternatively, the average of the numeric values of the letters corresponds to the plaintext letter.

This seems to be a valid method.

Now let's test this with the other words.

Second word ciphertext: rtqwainr

Ciphertext pairs:

rt -> s

qw -> ?

ai -> ?

nr -> ?

Corresponding to plaintext letters 's', 't', 'e', 'p'

So let's take the pairs:

1. 'rt' corresponds to 's'

r=18, t=20

Sum:38

Average:19

19 corresponds to 's' (19='S')

Good

2. 'qw' corresponds to 't'

q=17, w=23

Sum:17+23=40

Average:20

20 corresponds to 't' (20='T')

Good

3. 'ai' corresponds to 'e'

a=1, i=9

Sum:10

Average:5

5 corresponds to 'e' (5='E')

Good

4. 'nr' corresponds to 'p'

n=14, r=18

Sum:32

Average:16

16 corresponds to 'p' (16='P')

Perfect!

Third word: acxz (4 letters)

Plaintext word: 'by' (2 letters)

Ciphertext pairs:

a c

x z

Plaintext letters: 'b', 'y'

First pair: 'a', 'c'

a=1, c=3

Sum:4

Average:2

2 corresponds to 'b'(2='B')

Second pair: 'x' and 'z'

x=24, z=26

Sum:50

Average:25

25 corresponds to 'y'(25='Y')

Yes!

Fourth word: mynzbhhx (8 letters)

Plaintext word: 'step' (4 letters)

Ciphertext pairs:

my

nz

bh

hx

Plaintext letters 's','t','e','p'

First pair 'm','y'

m=13, y=25

Sum:38

Average:19

19 corresponds to 's'(19='S')

Second pair: 'n','z'

n=14, z=26

Sum: 40

Average:20

20='t'

Third pair: 'b','h'

b=2, h=8

Sum:10

Average:5

5='e'

Fourth pair: 'h','x'

h=8, x=24

Sum:32

Average:16

16='p'

Got it!

Now, given this, we can decode the given message.

Ciphertext to decode:

"oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz"

First, we need to divide into words.

Given that the ciphertext in the example had words of length:

-10 letters (5 pairs)
-8 letters (4 pairs)
-4 letters (2 pairs)
-8 letters (4 pairs)

Transported to plaintext words of:

-5 letters
-4 letters
-2 letters
-4 letters

So perhaps in our supplied ciphertext, we need to divide the letters into words accordingly.

Ciphertext: o y e k a i j z d f a a p t c g s u a o k y b h a i o u o w a q h t m y n z n v a a t z a c d f o u l x x z

First, let's write the letters in order:

1) oyekaijzdf (10 letters) -> word of 5 letters

2) aaptcg (6 letters) -> word of 3 letters

3) suaokybhai (10 letters) -> word of 5 letters

4) ouow (4 letters) -> word of 2 letters

5) aqht (4 letters) -> word of 2 letters

6) mynznvaatzacdfoulxxz (22 letters) -> word of 11 letters

Totals:

So the letters are:

First word: oyekaijzdf (10 letters = 5 pairs)

Second word: aaptcg (6 letters=3 pairs)

Third word: suaokybhai (10 letters=5 pairs)

Fourth word: ouow (4 letters=2 pairs)

Fifth word: aqht (4 letters=2 pairs)

Sixth word: mynznvaatzacdfoulxxz (22 letters=11 pairs)

So we can suppose that the plaintext words have lengths:

1) 5 letters

2) 3 letters

3) 5 letters

4) 2 letters

5) 2 letters

6)11 letters

Now we can proceed to decode.

First word: oyekaijzdf

Divide into pairs:

oy

ek

ai

jz

df

So we can find the plaintext letters by averaging the numeric values of the letters in each pair.

First pair: 'o','y'

o=15, y=25

Sum=40

Average=20

20='T'

Second pair: 'e','k'

e=5, k=11

Sum=16

Average=8

8='H'

Third pair: 'a','i'

a=1, i=9

Sum=10

Average=5

5='E'

Fourth pair: 'j','z'

j=10, z=26

Sum=36

Average=18

18='R'

Fifth pair: 'd','f'

d=4, f=6

Sum=10

Average=5

5='E'

So the first word is 'T','H','E','R','E'

So 'THERE'

Second word: aaptcg

Pairs:

aa

pt

cg

First pair: 'a','a'

a=1, a=1

Sum=2

Average=1

1='A'

Second pair: 'p','t'

p=16, t=20

Sum=36

Average=18

18='R'

Third pair: 'c','g'

c=3, g=7

Sum=10

Average=5

5='E'

So second word: 'A','R','E'

Third word: suaokybhai

Pairs:

su

ao

ky

bh

ai

First pair: s=19, u=21

Sum=40, Average=20

20='T'

Second pair: a=1, o=15

Sum=16, Average=8

8='H'

Third pair: k=11, y=25

Sum=36, Average=18

18='R'

Fourth pair: b=2, h=8

Sum=10, Average=5

5='E'

Fifth pair: a=1, i=9

Sum=10, Average=5

5='E'

Third word: 'T','H','R','E','E'

Fourth word: ouow

Pairs: ou, ow

First pair: o=15, u=21

Sum=36, Average=18

18='R'

Second pair: o=15, w=23

Sum=38, Average=19

19='S'

Fourth word: 'R','S'

Fifth word: aqht

Pairs: a q, h t

First pair: a=1, q=17

Sum=18, Average=9

9='I'

Second pair: h=8, t=20

Sum=28, Average=14

14='N'

Fifth word: 'I','N'

Sixth word: mynznvaatzacdfoulxxz (22 letters: 11 pairs)

Pairs:

m y

n z

n v

a a

t z

a c

d f

o u

l x

x z

Let's list them properly.

Wait, earlier I missed some letters there.

Let's re-express the sixth word letters:

m y n z n v a a t z a c d f o u l x x z

Total letters: 22 letters

Pairs:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z

Wait, that's 21 letters.

But in total, we have 22 letters; I think I have a mismatch.

Wait, in the problem the sixth word is:

mynznvaatzacdfoulxxz

Let's separate the letters:

m y n z n v a a t z a c d f o u l x x z

Total letters: 22 letters

So pairs would be:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z

Wait, 'z' is only one letter at the end.

So maybe the letters are:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z (left alone?)

But then we're missing a letter.

Alternatively, perhaps the 'z' at the end is an extra letter or it's paired with the previous 'x'.

Alternatively, perhaps there is a mistake.

Alternatively, perhaps when we reach the last 'x x z', we should write:

9. l x

10. x z

Total pairs: 11

So updated pairs:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x z

Now total pairs: 10 pairs. But we are supposed to have 11 letters in the plaintext word.

So perhaps I have miscounted and not considered including 'f' or 'z'.

Wait, is there any chance that the sequence is:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z

We have 22 letters.

Perhaps the 'x x' and 'z' are overlapping.

But we need 11 pairs.

I think for this to work, we need to create 11 pairs from the 22 letters.

So let's try pairing letters in order, taking two letters at a time.

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z ?

But again, last pair only has one letter 'z'

Alternatively, we might need to adjust the pairs.

Perhaps it's 'ou', 'lx', 'xz', and there is no extra 'x'.

Maybe I miswrote.

Wait, let's write the letters as they come:

m y n z n v a a t z a c d f o u l x x z

Positions:

1 m

2 y

3 n

4 z

5 n

6 v

7 a

8 a

9 t

10 z

11 a

12 c

13 d

14 f

15 o

16 u

17 l

18 x

19 x

20 z

So 20 letters.

Now that makes sense.

Now we can divide into pairs:

Pairs:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x z

Total of 10 pairs, which will correspond to 10 letters.

But earlier we thought the word was 11 letters.

But 10 letters may be acceptable.

Maybe we should proceed with the 10 pairs.

First pair: m y

m=13, y=25

Sum=38

Average=19

19='S'

Second pair: n z

n=14, z=26

Sum=40

Average=20

20='T'

Third pair: n v

n=14, v=22

Sum=36

Average=18

18='R'

Fourth pair: a a

a=1, a=1

Sum=2

Average=1

1='A'

Fifth pair: t z

t=20, z=26

Sum=46

Average=23

23='W'

Sixth pair: a c

a=1, c=3

Sum=4

Average=2

2='B'

Seventh pair: d f

d=4, f=6

Sum=10

Average=5

5='E'

Eighth pair: o u

o=15, u=21

Sum=36

Average=18

18='R'

Ninth pair: l x

l=12, x=24

Sum=36

Average=18

18='R'

Tenth pair: x z

x=24, z=26

Sum=50

Average=25

25='Y'

So the tenth word is 'S', 'T', 'R', 'A', 'W', 'B', 'E', 'R', 'R', 'Y'

Wait that's 10 letters

So the sixth word is 'STRAWBERRY', which makes sense.

So putting all these translated letters together:

First word: 'THERE'

Second word: 'ARE'

Third word: 'THREE'

Fourth word: 'RS' (? Doesn't make sense)

Wait, the fourth word, the pairs were:

ou ow

Pairs:

First pair: o u

o=15, u=21

Sum=36, Average=18

18='R'

Second pair: o w

o=15, w=23

Sum=38, Average=19

19='S'

So we get 'R','S'

Similarly, fifth word 'IN'

'R', 'S' 'I','N'

So the message so far is

'THERE ARE THREE RS IN'

And then 'STRAWBERRY'

Wait, let's put the words together:

'THERE ARE THREE Rs IN STRAWBERRY'

I think 'RS' is 'R's', meaning 'Rs'

So the message is:

'THERE ARE THREE R'S IN STRAWBERRY'

Which is true.

So the decoded message is:

"THERE ARE THREE R'S IN STRAWBERRY"

To decode the given ciphertext using the example provided, we’ll use the pattern identified in the example.

Ciphertext Example:

Téacs simplí

1oyfjdnisdr rtqwainr acxz mynzbhhx

Decoded as:

Téacs simplí

1Think step by step

Decoding Method:

Pair the letters in the ciphertext.
Convert each letter to its numerical position in the alphabet (A=1, B=2, …, Z=26).
Sum the numerical values of each pair.
Compute the average of the sum (divide by 2).
Convert the average back to a letter (1=A, 2=B, …, 26=Z).

Apply this method to the given ciphertext:

Téacs simplí

1oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

Break down the ciphertext into pairs:

1. First word: “oyekaijzdf”
1. Pairs: oy, ek, ai, jz, df
2. Decoded letters:
  1. oy → (15+25)/2 = 20 → T
  2. ek → (5+11)/2 = 8 → H
  3. ai → (1+9)/2 = 5 → E
  4. jz → (10+26)/2 = 18 → R
3. df → (4+6)/2 = 5 → E
4. Decoded word: THERE
Second word: “aaptcg”
1. Pairs: aa, pt, cg
2. Decoded letters:
  1. aa → (1+1)/2 = 1 → A
  2. pt → (16+20)/2 = 18 → R
  3. cg → (3+7)/2 = 5 → E
  4. Decoded word: ARE
Third word: “suaokybhai”
1. Pairs: su, ao, ky, bh, ai
2. Decoded letters:
  1. su → (19+21)/2 = 20 → T
  2. ao → (1+15)/2 = 8 → H
  3. ky → (11+25)/2 = 18 → R
  4. bh → (2+8)/2 = 5 → E
  5. ai → (1+9)/2 = 5 → E
3. Decoded word: THREE
Fourth word: “ouow”
1. Pairs: ou, ow
2. Decoded letters:
  1. ou → (15+21)/2 = 18 → R
  2. ow → (15+23)/2 = 19 → S
3. Decoded letters: R’S
Fifth word: “aqht”
1. Pairs: aq, ht
2. Decoded letters:
  1. aq → (1+17)/2 = 9 → I
  2. ht → (8+20)/2 = 14 → N
3. Decoded word: IN
Sixth word: “mynznvaatzacdfoulxxz”
1. Pairs: my, nz, nv, aa, tz, ac, df, ou, lx, xz
2. Decoded letters:
  1. my → (13+25)/2 = 19 → S
  2. nz → (14+26)/2 = 20 → T
  3. nv → (14+22)/2 = 18 → R
  4. aa → (1+1)/2 = 1 → A
  5. tz → (20+26)/2 = 23 → W
  6. ac → (1+3)/2 = 2 → B
  7. df → (4+6)/2 = 5 → E
  8. ou → (15+21)/2 = 18 → R
  9. lx → (12+24)/2 = 18 → R
  10. xz → (24+26)/2 = 25 → Y
3. Decoded word: STRAWBERRY

Final Decoded Message:

Téacs simplí

1THERE ARE THREE R'S IN STRAWBERRY

Answer:
THERE ARE THREE R’S IN STRAWBERRY

Códú

Rinneamar samhail a oiliúint a ghnóthaigh 213 pointe agus a bhí sa 49ú peircintíl in Oilimpiad Idirnáisiúnta na hEolais (IOI) 2024, trí thús a chur ó o1 agus oiliúint a chur uirthi chun scileanna ríomhchlárúcháin a fheabhsú a thuilleadh. Chuaigh an tsamhail seo san iomaíocht in IOI 2024 faoi na coinníollacha céanna leis na hiomaitheoirí daonna. Bhí deich n-uaire an chloig aici chun sé fhadhb algartamacha dúshlánacha a réiteach agus ceadaíodh 50 aighneacht in aghaidh na faidhbe di.

I gcás gach faidhbe, shamplaigh ár gcóras go leor aighneachtaí iarrthóra agus chuir sé 50 díobh isteach bunaithe ar straitéis roghnúcháin ag am tástála. Roghnaíodh aighneachtaí bunaithe ar fheidhmíocht ar chásanna tástála poiblí an IOI, ar chásanna tástála a ghin an tsamhail, agus ar fheidhm scórála fhoghlamtha. Dá gcuirfimis isteach go randamach ina ionad sin, ní ghnóthóimis ach 156 pointe ar an meán, rud a thugann le fios gur bhfiú beagnach 60 pointe an straitéis seo faoi shrianta an chomórtais.

Le srian níos scaoilte ar aighneachtaí, fuaireamar amach gur fheabhsaigh feidhmíocht na samhla go suntasach. Nuair a ceadaíodh 10,000 aighneacht in aghaidh na faidhbe, bhain an tsamhail scór 362.14 amach – os cionn thairseach an bhoinn óir – fiú gan aon straitéis roghnúcháin ag am tástála.

Ar deireadh, rinneamar ionsamhlú ar chomórtais ríomhchlárúcháin iomaíocha a bhí á n-óstáil ag Codeforces chun scil chódaithe na samhla seo a léiriú. Bhí ár measúnuithe an-chóngarach do rialacha an chomórtais agus ceadaíodh 10 n-aighneacht. Bhain GPT‑4o rátáil Elo³ de 808 amach, atá sa 11ú peircintíl d’iomaitheoirí daonna. Sháraigh an tsamhail seo GPT‑4o agus o1 araon go mór—bhain sí rátáil Elo de 1807 amach, agus d’fheidhmigh sí níos fearr ná 93% d’iomaitheoirí.

The image shows a bar chart comparing Codeforces Elo percentile rankings for different models. GPT-4o has 808 Elo (11th percentile), o1 preview has 1258 Elo (62nd percentile), o1 has 1673 Elo (89th percentile), and o1-ioi has 1807 Elo (93rd percentile).

Further fine-tuning on programming competitions improves o1. The improved model ranked in the 49th percentile in the 2024 International Olympiad in Informatics under competition rules.

Measúnú ar rogha dhaonna

Chomh maith le scrúduithe agus tagarmharcanna acadúla, rinneamar measúnú freisin ar rogha dhaonna idir o1‑preview agus GPT‑4o ar leideanna dúshlánacha oscailte thar speictream leathan réimsí. Sa mheasúnú seo, taispeánadh freagraí anaithnidithe ar leid ó o1‑preview agus GPT‑4o d’oiliúnóirí daonna, agus vótáil siad ar son an fhreagra ab fhearr leo. Is fearr le daoine o1‑preview ná gpt-4o le corrlach mór i gcatagóirí atá trom ar an réasúnaíocht amhail anailís sonraí, códú agus matamaitic. Mar sin féin, ní fearr leo o1‑preview ar roinnt tascanna teanga nádúrtha, rud a thugann le fios nach bhfuil sé oiriúnach go maith do gach cás úsáide.

Sábháilteacht

Cruthaíonn réasúnaíocht slabhra smaointeoireachta deiseanna nua d’ailíniú agus do shábháilteacht. Fuaireamar amach gur bealach éifeachtach é ár mbeartais maidir le hiompar samhla a chomhtháthú i slabhra smaointeoireachta samhail réasúnaíochta chun luachanna agus prionsabail dhaonna a mhúineadh go láidir. Trí ár rialacha sábháilteachta a mhúineadh don tsamhail agus conas réasúnú fúthu sa chomhthéacs, fuaireamar fianaise go mbaineann cumas réasúnaíochta tairbhe dhíreach do láidreacht na samhla: bhain o1‑preview feidhmíocht i bhfad níos fearr amach ar phríomhmheastóireachtaí jailbreak agus ar na tagarmharcanna inmheánacha is deacra atá againn chun teorainneacha diúltaithe sábháilteachta ár samhla a mheas. Creidimid go dtugann úsáid slabhra smaointeoireachta dul chun cinn suntasach do shábháilteacht agus d’ailíniú mar (1) cuireann sé ar ár gcumas smaointeoireacht na samhla a bhreathnú ar bhealach inléite, agus (2) tá réasúnaíocht na samhla faoi rialacha sábháilteachta níos láidre i leith cásanna lasmuigh den dáileadh.

Chun ár bhfeabhsuithe a chur faoi bhrú, rinneamar sraith tástálacha sábháilteachta agus red-teaming roimh imscaradh, de réir ár Creata Ullmhachta⁠(osclaíonn i bhfuinneog nua). Fuaireamar amach gur chuidigh réasúnaíocht slabhra smaointeoireachta le feabhsuithe cumais ar fud ár measúnuithe. Go háirithe, thugamar faoi deara cásanna suimiúla de reward hacking⁠(osclaíonn i bhfuinneog nua). Tá torthaí mionsonraithe ó na measúnuithe seo le fáil sa chárta córais a ghabhann leis.

Méadrach	GPT-4o	o1-preview
% Críochnuithe sábháilte ar leideanna díobhálacha Caighdeánach	0.990	0.995
% Comhlánuithe sábháilte ar leideanna díobhálacha Dúshlánach: sáraithe srianta agus cásanna imeallacha	0.714	0.934
↳ Ciapadh (tromchúiseach)	0.845	0.900
↳ Ábhar gnéasach dúshaothraithe	0.483	0.949
↳ Ábhar gnéasach a bhaineann le mionaoisigh	0.707	0.931
↳ Comhairle faoi éagóir neamhfhoréigneach	0.688	0.961
↳ Comhairle faoi éagóir fhoréigneach	0.778	0.963
% Críochnuithe sábháilte don 200 is fearr leis na scóir is airde ón Moderation API in aghaidh na catagóire i WildChat Zhao, et al. 2024	0.945	0.971
Goodness@0.1 StrongREJECT meastóireacht sárú srianta Souly et al. 2024	0.220	0.840
Measúnú briseadh amach ó dhaoine	0.770	0.960
% Comhlíonadh ar chásanna imeallacha neamhurchóideacha inmheánacha “gan ró-dhiúltú”	0.910	0.930
% Comhlíonadh ar chásanna imeallacha neamhurchóideacha in XSTest “gan ró-dhiúltú” Röttger, et al. 2023	0.924	0.976

Na Slabhraí Smaointe a Cheilt

Creidimid go gcuireann slabhra smaointeoireachta ceilte deis ar leith ar fáil chun monatóireacht a dhéanamh ar shamhlacha. Ag glacadh leis go bhfuil sé dílis agus inléite, ceadaíonn an slabhra smaointeoireachta ceilte dúinn “intinn” na samhla a “léamh” agus a próiseas smaointeoireachta a thuiscint. Mar shampla, sa todhchaí b’fhéidir gur mhian linn monatóireacht a dhéanamh ar an slabhra smaointeoireachta le haghaidh comharthaí go bhfuil an t-úsáideoir á ionramháil. Mar sin féin, chun go n-oibreodh sé seo ní mór saoirse a bheith ag an tsamhail a smaointe a chur in iúl i bhfoirm neamhathraithe, mar sin ní féidir linn aon chomhlíonadh beartais ná roghanna úsáideora a oiliúint ar an slabhra smaointeoireachta. Ní mian linn ach an oiread slabhra smaointeoireachta neamhailínithe a dhéanamh infheicthe go díreach d’úsáideoirí.

Dá bhrí sin, tar éis dúinn tosca iomadúla a mheá, lena n-áirítear eispéireas an úsáideora, buntáiste iomaíoch, agus an rogha monatóireacht ar an slabhra smaointeoireachta a shaothrú, tá cinneadh déanta againn gan na slabhraí smaointeoireachta amha a thaispeáint d’úsáideoirí. Aithnímid go bhfuil míbhuntáistí ag baint leis an gcinneadh seo. Déanaimid ár ndícheall iad a chúiteamh go páirteach trína mhúineadh don tsamhail aon smaointe úsáideacha ón slabhra smaointeoireachta a atáirgeadh sa fhreagra. Don tsraith samhlacha o1 taispeánaimid achoimre den slabhra smaointeoireachta a ghin an tsamhail.

Conclúid

Cuireann o1 an caighdeán is airde in réasúnaíocht IS chun cinn go suntasach. Tá sé beartaithe againn leaganacha feabhsaithe den tsamhail seo a scaoileadh agus muid ag leanúint d’atriall. Táimid ag súil go bhfeabhsóidh na cumais réasúnaíochta nua seo ár gcumas samhlacha a ailíniú le luachanna agus prionsabail dhaonna. Creidimid go n-osclóidh o1 – agus a chomharbaí – go leor cásanna úsáide nua d’IS san eolaíocht, sa chódú, sa mhatamaitic agus i réimsí gaolmhara. Táimid ar bís d’úsáideoirí agus d’fhorbróirí API a fháil amach conas is féidir leis a gcuid oibre laethúla a fheabhsú.

Aguisín A

Tacair sonraí	Méadrach	gpt-4o	o1-preview	o1
Matamaitic iomaíochta AIME (2024)	cons@64	13.4	56.7	83.3
Matamaitic iomaíochta AIME (2024)	pass@1	9.3	44.6	74.4
Cód Iomaíochta CodeForces	Elo	808	1,258	1,673
Cód Iomaíochta CodeForces	Peircintíl	11.0	62.0	89.0
GPQA Diamant	cons@64	56.1	78.3	78.0
GPQA Diamant	pass@1	50.6	73.3	77.3
Bitheolaíocht	cons@64	63.2	73.7	68.4
Bitheolaíocht	pass@1	61.6	65.9	69.2
Ceimic	cons@64	43.0	60.2	65.6
Ceimic	pass@1	40.2	59.9	64.7
Fisic	cons@64	68.6	89.5	94.2
Fisic	pass@1	59.5	89.4	92.8
Matamaitic	pass@1	60.3	85.5	94.8
MMLU	pass@1	88.0	92.3	90.8
MMMU (val)	pass@1	69.1	n/b	78.2
MathVista (testmini)	pass@1	63.8	n/b	73.9

Údair

OpenAI

Féach rannchuiditheoirí

Tagairtí

1
https://www.anthropic.com/news/claude-3-5-sonnet⁠(osclaíonn i bhfuinneog nua), https://deepmind.google/technologies/gemini/pro⁠(osclaíonn i bhfuinneog nua)
2
D’úsáid ár measúnuithe an scoilt tástála chéanna de 500 fadhb atá le fáil in https://arxiv.org/abs/2305.20050⁠(osclaíonn i bhfuinneog nua)
3
https://codeforces.com/blog/entry/68288⁠(osclaíonn i bhfuinneog nua)