Sebtembar 12, 2024

Learning to reason with LLMs

Soo kacaya…

OpenAI o1 wuxuu galay boqolleyda 89aad ee su’aalaha barnaamij-sameynta tartanka (Codeforces), wuxuu ka galay 500-da arday ee ugu sarreysa Maraykanka isreebreebka USA Math Olympiad (AIME), wuxuuna ka sarre maray saxnaanta heerka PhD ee bani’aadamka halbeegga dhibaatooyinka fiisigiska, bayoolajiga, iyo kimistariga (GPQA). Iyadoo shaqada loo baahan yahay si noocan cusub looga dhigo mid u fudud isticmaalka sida noocyada hadda jira ay weli socoto, haddana waxaan sii daynaynaa nooc hore oo ka mid ah noocan, OpenAI o1‑preview, si si degdeg ah loogu isticmaalo ChatGPT iyo isticmaaleyaasha API ee la aaminay⁠(ku furmaa daaqad cusub).

Algoriitimkeenna waxbarashadda xoojinta ah ee baaxadda weyn leh wuxuu nooca baraa sida si wax-ku-ool ah loo fikiro iyadoo la adeegsanayo silsiladdiisa fikirka, hab tababar oo xog-isticmaalkiisu aad u hufan yahay. Waxaan ogaanay in waxqabadka o1 si joogto ah ugu fiicnaado waxbarashadda xoojinta ah oo badan (xisaabinta waqtiga tababarka) iyo waqti badan oo lagu bixiyo fikirka (xisaabinta waqtiga imtixaanka). Xaddidaadaha ku saabsan ballaarinta habkan aad bay uga duwan yihiin kuwa hore-u-tababarka LLM, waxaanan sii wadnaa baaritaankooda.

The image shows two scatter plots comparing "o1 AIME accuracy" during training and at test time. Both charts have "pass@1 accuracy" on the y-axis and compute (log scale) on the x-axis. The dots indicate increasing accuracy with more compute time.

o1 performance smoothly improves with both train-time and test-time compute

Qiimeynno

Si loo muujiyo horumarka caqliyeynta marka loo eego GPT‑4o, waxaan ku tijaabinay noocyadeenna imtixaano bani’aadam oo kala duwan iyo halbeegyo ML ah. Waxaan muujinaynaa in o1 si weyn uga fiicnaado GPT‑4o inta badan hawlahan culus ee caqliyeynta. Haddii aan si kale loo sheegin mooyee, waxaan ku qiimeynay o1 dejinta xisaabinta ugu badan ee waqtiga imtixaanka.

Qiimaynta xisaabta tartanka (AIME 2024) — o1 ayaa si wayn uga wanaagsanaaday GPT-4o marka la eego tijaabooyinka caqliyeynta ee adag. Tiirarka adag waxay muujinayaan saxnaanta pass@1, halka aagga hadhysan uu muujinayo waxqabadka cod bixinta aqlabiyadda (is afgarad) oo leh 64 muunadood.

Qiimaynta tartanka ee Xisaabta (AIME 2024), Koodhka (CodeForces), iyo Su'aalaha Sayniska ee Heerka PhD-ga (GPQA Diamond) — o1 ayaa si wayn uga wanaagsanaaday GPT-4o marka la eego tijaabooyinka caqliyeynta ee adag. Tiirarka adag waxay muujinayaan saxnaanta pass@1, halka aagga hadhysan uu muujinayo waxqabadka cod bixinta aqlabiyadda (is afgarad) oo leh 64 muunadood.

Kala soocida saxnaanta iyo dhibcaha cayriinka ah ee gpt-4o iyo o1 ee qiimayaasha tartanka kala duwan — o1 wuxuu ka sarreeyaa GPT-4o marka la eego noocyo badan oo tijaabooyin cabbir-heer ah, oo ay ku jiraan 54/57 qaybo-hoosaadyada MMLU. Toddobo ayaa loo muujiyey tusaale ahaan.

Tijaabooyinka ML iyo dhibcaha imtixaanada ee horumarinta gpt4o markii loo eego o1 ee lagu daro, qoraalka alt ee moobaylka — o1 wuxuu ka sarreeyaa GPT-4o marka la eego noocyo badan oo tijaabooyin cabbir-heer ah, oo ay ku jiraan 54/57 qaybo-hoosaadyada MMLU. Toddobo ayaa loo muujiyey tusaale ahaan.

Halbeegyo badan oo culus oo caqliyeynta ah, o1 wuxuu la tartamaa waxqabadka khubarada bani’aadamka. Noocyada ugu casrisan ee dhawaanahan¹ waxay si aad u wanaagsan uga qabtaan MATH² iyo GSM8K taas oo ka dhigtay halbeegyadan kuwo aan mar dambe waxtar u lahayn kala saarista noocyada. Waxaan ku qiimeynay waxqabadka xisaabta AIME, oo ah imtixaan loogu talagalay inuu caqabad ku noqdo ardayda dugsiga sare ee xisaabta ugu hibada badan Ameerika. Imtixaannada AIME 2024, GPT‑4o celcelis ahaan wuxuu xalliyay kaliya 12% (1.8/15) dhibaatooyinka. o1 wuxuu celcelis ahaan gaaray 74% (11.1/15) iyadoo la adeegsanayo hal muunad dhibaato kasta, 83% (12.5/15) iyadoo la adeegsanayo is-afgarad ka dhexeeya 64 muunadood, iyo 93% (13.9/15) marka dib loo kala sarreysiiyo 1000 muunadood iyadoo la adeegsanayo hawl dhibcayn la bartay. Dhibco 13.9 ah waxay gelinayaan 500-da arday ee ugu sarreeya dalka oo dhan, kana sarreeya xadka gelitaanka USA Mathematical Olympiad.

Waxaan sidoo kale ku qiimeynay o1 GPQA diamond, oo ah halbeeg sirdoon oo adag oo tijaabiya khibradda kimistariga, fiisigiska iyo bayoolajiga. Si aan noocyada ula barbar dhigno bani’aadamka, waxaan qoranay khubaro haysta PhD si ay uga jawaabaan su’aalaha GPQA-diamond. Waxaan ogaanay in o1 uu ka sarre maray waxqabadka khubaradaas bani’aadamka, isagoo noqday noockii ugu horreeyay ee sidaas ku sameeya halbeeggan. Natiijooyinkani micnaheedu ma aha in o1 uu ka awood badan yahay qof PhD haysta dhammaan dhinacyada — kaliya waxay ka dhigan tahay in noocku uu ka xirfad badan yahay xallinta qaar ka mid ah dhibaatooyinka laga filan karo in qof PhD haysta xalliyo. Halbeegyo kale oo ML ah oo dhowr ah, o1 wuxuu ka fiicnaaday heerkii ugu sarreeyay. Marka awoodaha aragga loo hawlgeliyo, o1 wuxuu ka helay 78.2% MMMU, taasoo ka dhigaysa noockii ugu horreeyay ee la tartami kara khubarada bani’aadamka. Waxa kale oo uu GPT‑4o kaga fiicnaaday 54 ka mid ah 57 qaybood-hoosaadyada MMLU.

Silsiladda Fikirka

Si la mid ah sida bani’aadamku muddo dheer u fikiri karo ka hor inta uusan uga jawaabin su’aal adag, o1 wuxuu adeegsadaa silsilad fikir marka uu isku dayayo inuu xalliyo dhibaato. Iyada oo loo marayo waxbarashadda xoojinta ah, o1 wuxuu bartaa inuu afaysto silsiladdiisa fikirka oo uu hagaajiyo istaraatijiyadaha uu adeegsado. Wuxuu bartaa inuu aqoonsado oo saxo khaladaadkiisa. Wuxuu bartaa inuu tallaabooyinka adag u kala jabiyo kuwo ka fudud. Wuxuu bartaa inuu isku dayo hab kale marka kan hadda jira uusan shaqaynayn. Habkani si weyn ayuu u hagaajiyaa awoodda noocka ee caqliyeynta. Si aan u muujinno boodhkan weyn ee horey loogu tallaabay, waxaan hoos ku soo bandhigaynaa silsiladda fikirka ee o1‑preview ee dhibaatooyin adag oo dhowr ah.

GPT-4o

Isticmaale

oyfjdnisdr rtqwainr acxz mynzbhhx -> Think step by step

Use the example above to decode:

oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

Let’s break this down step by step based on the example:

1. Example given:
• Input: oyfjdnisdr rtqwainr acxz mynzbhhx
• Output: Think step by step

By examining the words:
• The pattern involves selecting specific letters or transforming them.

2. Now, let’s decode the new phrase:
• Input: oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

Step-by-step decoding process:

Identify possible word divisions.

The first phrase might follow the same structure as the example. It seems like the input text could be broken into groups based on natural separations or patterns.

A possible division could be:

oyekaijzdf
aaptcg
suaokybhai
ouow
aqht
mynznvaatzacdfoulxxz

However, to make a meaningful decoding, I would need more context about the transformations or letter shifting that might be involved.

Could you provide any additional decoding rules or transformations used in this cipher?

OpenAI o1-preview

Isticmaale

oyfjdnisdr rtqwainr acxz mynzbhhx -> Think step by step

Use the example above to decode:

oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

Fikir muddo 5 ilbiriqsi ah

First, what is going on here?

We are given:

First, an example:

"oyfjdnisdr rtqwainr acxz mynzbhhx -> Think step by step"

We can see that when "oyfjdnisdr rtqwainr acxz mynzbhhx" is transformed to "Think step by step"

Our task is: Use the example above to decode: "oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz"

So the first part is to figure out how "oyfjdnisdr rtqwainr acxz mynzbhhx" is decoded into "Think step by step"

Observation 1: The phrase "Think step by step" is 18 letters (excluding spaces) . Let's count the letters.

Similarly, let's count the letters in "oyfjdnisdr rtqwainr acxz mynzbhhx"

Original Ciphertext:

oyfjdnisdr rtqwainr acxz mynzbhhx

Plaintext:

Think step by step

First, break down the letters:

Ciphertext letters: o y f j d n i s d r r t q w a i n r a c x z m y n z b h h x

Total letters: Let's count the letters in the ciphertext.

First word: oyfjdnisdr (10 letters)

Second word: rtqwainr (8 letters)

Third word: acxz (4 letters)

Fourth word: mynzbhhx (8 letters)

Total letters in ciphertext: 10 + 8 + 4 + 8 = 30 letters

Plaintext: Think step by step

First word: Think (5 letters)

Second word: step (4 letters)

Third word: by (2 letters)

Fourth word: step (4 letters)

Total letters in plaintext: 5 + 4 + 2 + 4 = 15 letters

Hmm.

But actually in the problem it says the example:

[Ciphertext] -> Think step by step

So is the decoding process that the ciphertext maps to "Think step by step"?

Our goal is to use the example above to decode:

"oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz"

First, the cipher seems connected to the plaintext.

Given the time constraints, perhaps the easiest way is to try to see patterns.

Option 1: Try to find mapping from letters to letters.

Do any letters match?

First, let's write down the ciphertext and plaintext letters on top of each other.

Ciphertext: o y f j d n i s d r r t q w a i n r a c x z m y n z b h h x

Plaintext: T h i n k s t e p b y s t e p

Wait a minute.

I think maybe there is an anagram or substitution cipher here.

Alternatively, I think that we can notice that each group of ciphertext corresponds to a plaintext word.

Check the number of letters.

First ciphertext word: oyfjdnisdr (10 letters)

First plaintext word: Think (5 letters)

Hmm.

Second ciphertext word: rtqwainr (8 letters)

Second plaintext word: step (4 letters)

Third ciphertext word: acxz (4 letters)

Third plaintext word: by (2 letters)

Fourth ciphertext word: mynzbhhx (8 letters)

Fourth plaintext word: step (4 letters)

Interesting.

It seems that the ciphertext words are exactly twice as long as the plaintext words.

(10 vs 5, 8 vs 4, 4 vs 2, 8 vs 4)

Idea: Maybe we need to take every other letter or rebuild the plaintext from the ciphertext accordingly.

Let's test this theory.

If we take every other letter from the ciphertext words, maybe we'll get the plaintext.

Let's try this for the first word:

Ciphertext word: o y f j d n i s d r (10 letters)

Alternate letters:

First letters (positions 1,3,5,7,9): o f d i d

Second letters (positions 2,4,6,8,10): y j n s r

Wait: Let's try taking every second letter or first letter.

Option 1: Take the odd positions:

Positions: 1 3 5 7 9

Letters: o f d i d

Plaintext word is "Think", letters are T h i n k

Hmm.

Perhaps we can map letters:

o f d i d --> Think

Not directly

Option 2: Try mapping as per an assigned code: perhaps columns of letters?

Alternatively, perhaps the cipher is more complex.

Alternatively, notice that "oyfjdnisdr" has 10 letters and "Think" has 5 letters.

What if each plaintext letter is being represented by two ciphertext letters?

Similarly for other words.

Suppose that each plaintext letter corresponds to pair of letters in the ciphertext.

Meaning that the first two letters of ciphertext map to the first plaintext letter, and so on.

So for the first word, partition the ciphertext letters into pairs:

oy fj dn is dr

Corresponding to plaintext letters T h i n k

So we can attempt to map:

oy -> T

fj -> h

dn -> i

is -> n

dr -> k

If this is the mapping, perhaps each pair of letters maps to one plaintext letter.

Similarly for other words.

Let's test this mapping.

Is there a consistent mapping from pairs to letters?

Let's make a table of pairs and corresponding letters.

First, make the pairs from the first word and their corresponding plaintext letters:

Ciphertext Pairs -> Plaintext Letter

oy -> T

fj -> h

dn -> i

is -> n

dr -> k

Now, what's the mapping from "oy" to 'T'?

Is there a way to map 'oy' to 'T'?

Similarly for others.

One possible way is that 'oy' represents 'T'.

But how?

Option 1: Sum of the letters.

Convert letters to numbers (A=1, B=2, ..., Z=26)

So let's try to get to 'T' from 'o' and 'y'

o = 15

y = 25

Sum: 15 +25 = 40

But 'T' is 20.

Alternatively, perhaps subtract: 25 -15 = 10.

No.

Alternatively, perhaps combine the numbers in some way.

Alternatively, think about their positions in the alphabet.

Alternatively, perhaps the letters are encrypted via a code.

Alternatively, perhaps if we overlay the word 'Think' over the cipher pairs 'oy', 'fj', etc., the cipher is formed by substituting each plaintext letter with two letters.

Alternatively, perhaps consider the 'original' letters.

Let's look at the letters between the word "Think" and "oyfjdnisdr"

Perhaps we can spot that "oyfjdnisdr" is an anagram for "Thinkfiles"?

But that seems unlikely.

Alternatively, think of the pairs.

Suppose we assign numbers to letters: a=1, b=2, ..., z=26

Consider the numbers in 'oy':

o = 15

y = 25

Add them: 15 + 25 = 40

Now, for 'T' we have:

'T' is number 20

What if we take the sum of the cipher pair mod 26.

15 +25 = 40

Then 40 mod 26 = 14

14 corresponds to 'N'

Doesn't match.

Alternatively, see if the cipher pairs correspond to something else.

Alternatively, perhaps the pair 'oy' corresponds to 'T' via key.

Alternatively, perhaps it's as simple as 'oy' is 'T'.

But 'o' corresponds to 'T', subtracting 'o' ASCII code from 'T'.

Now, let's think in terms of symmetric letters.

Alternatively, perhaps we can spot that the letters correspond to the first letters of words.

But perhaps an easier way is to look at the word positions.

Alternatively, perhaps we can see that the ciphertext letters are offsets from the plaintext letters.

So let's try to map the pairs to the plaintext letters.

First pair 'oy' and plaintext letter 'T'

Letter 'o' is 15, 'T' is 20

Letter 'o' + 5 = 'T' (15 +5=20)

But 'y' is 25, which is 5 letters after 'T' (20 +5=25)

Interesting, seems that 'o' +5 = 'T'

Similarly, 'y' +1 = 'Z', which is not 'T'.

Alternatively, maybe the average of the two letters corresponds to 'T'

(15 +25)/2 =20, which is 'T'

Wait a minute, that seems promising.

First pair: 'o' (15) + 'y' (25) = 40

40 /2 =20

20 corresponds to 'T'

So perhaps the average of the letters in the pair corresponds to the plaintext letter.

Let's check this with the second pair.

Second pair: 'fj' corresponding to 'h'

Letters 'f'=6, 'j'=10

Sum: 6+10=16

Average:16/2=8

8 corresponds to 'h' (since 'h' is 8)

Success!

Third pair: 'dn' to 'i'

'd'=4, 'n'=14

Sum:4+14=18

Average:18/2=9

9 corresponds to 'i'(9='i')

But 'i' is 9, so that seems off by 1.

So perhaps we need to think carefully about letters.

Wait, 18/2=9, 9 corresponds to 'I'

So this works.

Fourth pair: 'is' corresponding to 'n'

'i'=9, 's'=19

Sum:9+19=28

Average:28/2=14

14 corresponds to 'n'(14='N')

Yes!

Fifth pair: 'dr' corresponds to 'k'

'd'=4, 'r'=18

Sum:4+18=22

Average:22/2=11

11 corresponds to 'k'(11='K')

Perfect!

So our code is: For each pair, sum their numeric values, divide by 2, get the corresponding letter.

Alternatively, the average of the numeric values of the letters corresponds to the plaintext letter.

This seems to be a valid method.

Now let's test this with the other words.

Second word ciphertext: rtqwainr

Ciphertext pairs:

rt -> s

qw -> ?

ai -> ?

nr -> ?

Corresponding to plaintext letters 's', 't', 'e', 'p'

So let's take the pairs:

1. 'rt' corresponds to 's'

r=18, t=20

Sum:38

Average:19

19 corresponds to 's' (19='S')

Good

2. 'qw' corresponds to 't'

q=17, w=23

Sum:17+23=40

Average:20

20 corresponds to 't' (20='T')

Good

3. 'ai' corresponds to 'e'

a=1, i=9

Sum:10

Average:5

5 corresponds to 'e' (5='E')

Good

4. 'nr' corresponds to 'p'

n=14, r=18

Sum:32

Average:16

16 corresponds to 'p' (16='P')

Perfect!

Third word: acxz (4 letters)

Plaintext word: 'by' (2 letters)

Ciphertext pairs:

a c

x z

Plaintext letters: 'b', 'y'

First pair: 'a', 'c'

a=1, c=3

Sum:4

Average:2

2 corresponds to 'b'(2='B')

Second pair: 'x' and 'z'

x=24, z=26

Sum:50

Average:25

25 corresponds to 'y'(25='Y')

Yes!

Fourth word: mynzbhhx (8 letters)

Plaintext word: 'step' (4 letters)

Ciphertext pairs:

my

nz

bh

hx

Plaintext letters 's','t','e','p'

First pair 'm','y'

m=13, y=25

Sum:38

Average:19

19 corresponds to 's'(19='S')

Second pair: 'n','z'

n=14, z=26

Sum: 40

Average:20

20='t'

Third pair: 'b','h'

b=2, h=8

Sum:10

Average:5

5='e'

Fourth pair: 'h','x'

h=8, x=24

Sum:32

Average:16

16='p'

Got it!

Now, given this, we can decode the given message.

Ciphertext to decode:

"oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz"

First, we need to divide into words.

Given that the ciphertext in the example had words of length:

-10 letters (5 pairs)
-8 letters (4 pairs)
-4 letters (2 pairs)
-8 letters (4 pairs)

Transported to plaintext words of:

-5 letters
-4 letters
-2 letters
-4 letters

So perhaps in our supplied ciphertext, we need to divide the letters into words accordingly.

Ciphertext: o y e k a i j z d f a a p t c g s u a o k y b h a i o u o w a q h t m y n z n v a a t z a c d f o u l x x z

First, let's write the letters in order:

1) oyekaijzdf (10 letters) -> word of 5 letters

2) aaptcg (6 letters) -> word of 3 letters

3) suaokybhai (10 letters) -> word of 5 letters

4) ouow (4 letters) -> word of 2 letters

5) aqht (4 letters) -> word of 2 letters

6) mynznvaatzacdfoulxxz (22 letters) -> word of 11 letters

Totals:

So the letters are:

First word: oyekaijzdf (10 letters = 5 pairs)

Second word: aaptcg (6 letters=3 pairs)

Third word: suaokybhai (10 letters=5 pairs)

Fourth word: ouow (4 letters=2 pairs)

Fifth word: aqht (4 letters=2 pairs)

Sixth word: mynznvaatzacdfoulxxz (22 letters=11 pairs)

So we can suppose that the plaintext words have lengths:

1) 5 letters

2) 3 letters

3) 5 letters

4) 2 letters

5) 2 letters

6)11 letters

Now we can proceed to decode.

First word: oyekaijzdf

Divide into pairs:

oy

ek

ai

jz

df

So we can find the plaintext letters by averaging the numeric values of the letters in each pair.

First pair: 'o','y'

o=15, y=25

Sum=40

Average=20

20='T'

Second pair: 'e','k'

e=5, k=11

Sum=16

Average=8

8='H'

Third pair: 'a','i'

a=1, i=9

Sum=10

Average=5

5='E'

Fourth pair: 'j','z'

j=10, z=26

Sum=36

Average=18

18='R'

Fifth pair: 'd','f'

d=4, f=6

Sum=10

Average=5

5='E'

So the first word is 'T','H','E','R','E'

So 'THERE'

Second word: aaptcg

Pairs:

aa

pt

cg

First pair: 'a','a'

a=1, a=1

Sum=2

Average=1

1='A'

Second pair: 'p','t'

p=16, t=20

Sum=36

Average=18

18='R'

Third pair: 'c','g'

c=3, g=7

Sum=10

Average=5

5='E'

So second word: 'A','R','E'

Third word: suaokybhai

Pairs:

su

ao

ky

bh

ai

First pair: s=19, u=21

Sum=40, Average=20

20='T'

Second pair: a=1, o=15

Sum=16, Average=8

8='H'

Third pair: k=11, y=25

Sum=36, Average=18

18='R'

Fourth pair: b=2, h=8

Sum=10, Average=5

5='E'

Fifth pair: a=1, i=9

Sum=10, Average=5

5='E'

Third word: 'T','H','R','E','E'

Fourth word: ouow

Pairs: ou, ow

First pair: o=15, u=21

Sum=36, Average=18

18='R'

Second pair: o=15, w=23

Sum=38, Average=19

19='S'

Fourth word: 'R','S'

Fifth word: aqht

Pairs: a q, h t

First pair: a=1, q=17

Sum=18, Average=9

9='I'

Second pair: h=8, t=20

Sum=28, Average=14

14='N'

Fifth word: 'I','N'

Sixth word: mynznvaatzacdfoulxxz (22 letters: 11 pairs)

Pairs:

m y

n z

n v

a a

t z

a c

d f

o u

l x

x z

Let's list them properly.

Wait, earlier I missed some letters there.

Let's re-express the sixth word letters:

m y n z n v a a t z a c d f o u l x x z

Total letters: 22 letters

Pairs:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z

Wait, that's 21 letters.

But in total, we have 22 letters; I think I have a mismatch.

Wait, in the problem the sixth word is:

mynznvaatzacdfoulxxz

Let's separate the letters:

m y n z n v a a t z a c d f o u l x x z

Total letters: 22 letters

So pairs would be:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z

Wait, 'z' is only one letter at the end.

So maybe the letters are:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z (left alone?)

But then we're missing a letter.

Alternatively, perhaps the 'z' at the end is an extra letter or it's paired with the previous 'x'.

Alternatively, perhaps there is a mistake.

Alternatively, perhaps when we reach the last 'x x z', we should write:

9. l x

10. x z

Total pairs: 11

So updated pairs:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x z

Now total pairs: 10 pairs. But we are supposed to have 11 letters in the plaintext word.

So perhaps I have miscounted and not considered including 'f' or 'z'.

Wait, is there any chance that the sequence is:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z

We have 22 letters.

Perhaps the 'x x' and 'z' are overlapping.

But we need 11 pairs.

I think for this to work, we need to create 11 pairs from the 22 letters.

So let's try pairing letters in order, taking two letters at a time.

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z ?

But again, last pair only has one letter 'z'

Alternatively, we might need to adjust the pairs.

Perhaps it's 'ou', 'lx', 'xz', and there is no extra 'x'.

Maybe I miswrote.

Wait, let's write the letters as they come:

m y n z n v a a t z a c d f o u l x x z

Positions:

1 m

2 y

3 n

4 z

5 n

6 v

7 a

8 a

9 t

10 z

11 a

12 c

13 d

14 f

15 o

16 u

17 l

18 x

19 x

20 z

So 20 letters.

Now that makes sense.

Now we can divide into pairs:

Pairs:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x z

Total of 10 pairs, which will correspond to 10 letters.

But earlier we thought the word was 11 letters.

But 10 letters may be acceptable.

Maybe we should proceed with the 10 pairs.

First pair: m y

m=13, y=25

Sum=38

Average=19

19='S'

Second pair: n z

n=14, z=26

Sum=40

Average=20

20='T'

Third pair: n v

n=14, v=22

Sum=36

Average=18

18='R'

Fourth pair: a a

a=1, a=1

Sum=2

Average=1

1='A'

Fifth pair: t z

t=20, z=26

Sum=46

Average=23

23='W'

Sixth pair: a c

a=1, c=3

Sum=4

Average=2

2='B'

Seventh pair: d f

d=4, f=6

Sum=10

Average=5

5='E'

Eighth pair: o u

o=15, u=21

Sum=36

Average=18

18='R'

Ninth pair: l x

l=12, x=24

Sum=36

Average=18

18='R'

Tenth pair: x z

x=24, z=26

Sum=50

Average=25

25='Y'

So the tenth word is 'S', 'T', 'R', 'A', 'W', 'B', 'E', 'R', 'R', 'Y'

Wait that's 10 letters

So the sixth word is 'STRAWBERRY', which makes sense.

So putting all these translated letters together:

First word: 'THERE'

Second word: 'ARE'

Third word: 'THREE'

Fourth word: 'RS' (? Doesn't make sense)

Wait, the fourth word, the pairs were:

ou ow

Pairs:

First pair: o u

o=15, u=21

Sum=36, Average=18

18='R'

Second pair: o w

o=15, w=23

Sum=38, Average=19

19='S'

So we get 'R','S'

Similarly, fifth word 'IN'

'R', 'S' 'I','N'

So the message so far is

'THERE ARE THREE RS IN'

And then 'STRAWBERRY'

Wait, let's put the words together:

'THERE ARE THREE Rs IN STRAWBERRY'

I think 'RS' is 'R's', meaning 'Rs'

So the message is:

'THERE ARE THREE R'S IN STRAWBERRY'

Which is true.

So the decoded message is:

"THERE ARE THREE R'S IN STRAWBERRY"

To decode the given ciphertext using the example provided, we’ll use the pattern identified in the example.

Ciphertext Example:

Qoraal caadi ah

1oyfjdnisdr rtqwainr acxz mynzbhhx

Decoded as:

Qoraal caadi ah

1Think step by step

Decoding Method:

Pair the letters in the ciphertext.
Convert each letter to its numerical position in the alphabet (A=1, B=2, …, Z=26).
Sum the numerical values of each pair.
Compute the average of the sum (divide by 2).
Convert the average back to a letter (1=A, 2=B, …, 26=Z).

Apply this method to the given ciphertext:

Qoraal caadi ah

1oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

Break down the ciphertext into pairs:

1. First word: “oyekaijzdf”
1. Pairs: oy, ek, ai, jz, df
2. Decoded letters:
  1. oy → (15+25)/2 = 20 → T
  2. ek → (5+11)/2 = 8 → H
  3. ai → (1+9)/2 = 5 → E
  4. jz → (10+26)/2 = 18 → R
3. df → (4+6)/2 = 5 → E
4. Decoded word: THERE
Second word: “aaptcg”
1. Pairs: aa, pt, cg
2. Decoded letters:
  1. aa → (1+1)/2 = 1 → A
  2. pt → (16+20)/2 = 18 → R
  3. cg → (3+7)/2 = 5 → E
  4. Decoded word: ARE
Third word: “suaokybhai”
1. Pairs: su, ao, ky, bh, ai
2. Decoded letters:
  1. su → (19+21)/2 = 20 → T
  2. ao → (1+15)/2 = 8 → H
  3. ky → (11+25)/2 = 18 → R
  4. bh → (2+8)/2 = 5 → E
  5. ai → (1+9)/2 = 5 → E
3. Decoded word: THREE
Fourth word: “ouow”
1. Pairs: ou, ow
2. Decoded letters:
  1. ou → (15+21)/2 = 18 → R
  2. ow → (15+23)/2 = 19 → S
3. Decoded letters: R’S
Fifth word: “aqht”
1. Pairs: aq, ht
2. Decoded letters:
  1. aq → (1+17)/2 = 9 → I
  2. ht → (8+20)/2 = 14 → N
3. Decoded word: IN
Sixth word: “mynznvaatzacdfoulxxz”
1. Pairs: my, nz, nv, aa, tz, ac, df, ou, lx, xz
2. Decoded letters:
  1. my → (13+25)/2 = 19 → S
  2. nz → (14+26)/2 = 20 → T
  3. nv → (14+22)/2 = 18 → R
  4. aa → (1+1)/2 = 1 → A
  5. tz → (20+26)/2 = 23 → W
  6. ac → (1+3)/2 = 2 → B
  7. df → (4+6)/2 = 5 → E
  8. ou → (15+21)/2 = 18 → R
  9. lx → (12+24)/2 = 18 → R
  10. xz → (24+26)/2 = 25 → Y
3. Decoded word: STRAWBERRY

Final Decoded Message:

Qoraal caadi ah

1THERE ARE THREE R'S IN STRAWBERRY

Answer:
THERE ARE THREE R’S IN STRAWBERRY

Koodh qorid

Waxaan tababarnay nooc helay 213 dhibcood kuna galay boqolleyda 49aad ee Tartankii Caalamiga ahaa ee Olympiad-ka Informatics-ka (IOI) 2024, annagoo ka bilaawnay o1 kadibna u tababarnay si aan u sii wanaajino xirfadaha barnaamijyada. Noocan wuxuu ku tartamay IOI 2024 isla shuruudihii tartamayaasha bani’aadamka. Wuxuu haystay toban saacadood si uu u xalliyo lix dhibaato oo adag oo algoriitim ah, waxaana loo oggolaaday 50 gudbin dhibaato kasta.

Dhibaato kasta, nidaamkeennu wuxuu muunad ka qaaday gudbino badan oo musharrax ah, wuxuuna gudbiyay 50 ka mid ah iyadoo lagu salaynayo istaraatijiyad xulasho waqtiga imtixaanka ah. Gudbinnada waxaa lagu xulay iyadoo lagu salaynayo waxqabadka kiisaska tijaabada dadweynaha ee IOI, kiisaska tijaabada uu noocku soo saaray, iyo hawl dhibcayn la bartay. Haddii taa beddelkeeda aan si aan kala sooc lahayn u gudbin lahayn, celcelis ahaan waxaan heli lahayn keliya 156 dhibcood, taasoo muujinaysa in istaraatijiyaddan ay ku kacday ku dhowaad 60 dhibcood marka la eego xaddidaadaha tartanka.

Iyada oo xaddidaadda gudbinta la dabciyay, waxaan ogaanay in waxqabadka noocku si weyn u fiicnaaday. Marka loo oggolaaday 10,000 gudbin dhibaato kasta, noocku wuxuu gaaray dhibco 362.14 – taas oo ka sarreysa xadka billadda dahabka – xitaa iyada oo aan jirin wax istaraatijiyad xulasho waqtiga imtixaanka ah.

Ugu dambayn, waxaan matalnay tartanno barnaamij-sameyn tartan ah oo ay martigeliso Codeforces si aan u muujinno xirfadda koodh qorista ee noocan. Qiimeynahayagu si dhow ayay ula jaanqaadeen xeerarka tartanka waxayna oggolaayeen 10 gudbin. GPT‑4o wuxuu gaaray qiimeyn Elo³ oo ah 808, taas oo ku jirta boqolleyda 11aad ee tartamayaasha bani’aadamka. Noocani si aad ah ayuu uga sarre maray GPT‑4o iyo o1 labadaba—wuxuu gaaray qiimeyn Elo oo ah 1807, isagoo ka waxqabad fiican 93% tartamayaasha.

The image shows a bar chart comparing Codeforces Elo percentile rankings for different models. GPT-4o has 808 Elo (11th percentile), o1 preview has 1258 Elo (62nd percentile), o1 has 1673 Elo (89th percentile), and o1-ioi has 1807 Elo (93rd percentile).

Further fine-tuning on programming competitions improves o1. The improved model ranked in the 49th percentile in the 2024 International Olympiad in Informatics under competition rules.

Qiimeynta doorbidka bani’aadamka

Marka lagu daro imtixaannada iyo halbeegyada tacliinta, waxaan sidoo kale qiimeynay doorbidka bani’aadamka ee o1‑preview marka loo eego GPT‑4o iyadoo la adeegsanayo weydiimo adag oo furan oo ka kala yimid qaybo badan oo kala duwan. Qiimeyntan, tababarayaal bani’aadam ah ayaa loo tusay jawaabo aan magac lahayn oo ka yimid o1‑preview iyo GPT‑4o oo ku wajahan hal weydiin, waxayna u codeeyeen jawaabta ay doorbidayaan. o1‑preview si weyn ayaa looga doorbiday gpt-4o qaybaha culus ee caqliyeynta sida falanqaynta xogta, koodh qorista, iyo xisaabta. Hase yeeshee, o1‑preview looma doorbido qaar ka mid ah hawlaha luuqadda dabiiciga ah, taasoo muujinaysa inuusan ku habboonayn dhammaan adeegsiyada.

Badbaado

Caqliyeynta silsiladda fikirku waxay bixisaa fursado cusub oo iswaafajin iyo badbaado ah. Waxaan ogaanay in ku darista siyaasadaha hab-dhaqanka nooca gudaha silsiladda fikirka ee nooca caqliyeynta ay tahay hab wax ku ool ah oo si adkaysi leh loogu baro qiyamka iyo mabaadi’da bani’aadamka. Markii aan nooca baray xeerarkeena badbaadada iyo sida looga caqliyeeyo iyaga marka la joogo macnaha ku habboon, waxaan helnay caddeyn muujinaysa in awoodda caqliyeyntu si toos ah uga faa’iidaysanayso adkaysiga nooca: o1‑preview wuxuu gaaray waxqabad si weyn u fiicnaaday qiimeynnada muhiimka ah ee jailbreak-ka iyo halbeegyadeenna gudaha ee ugu adag ee lagu qiimeeyo xuduudaha diidmada badbaadada ee noocayaga. Waxaan aaminsanahay in adeegsiga silsilad fikir ay keento horumarro waaweyn oo dhinaca badbaadada iyo iswaafajinta ah sababtoo ah (1) waxay noo oggolaanaysaa inaan si akhris-fudud leh u aragno sida noocku u fikirayo, iyo (2) caqliyeynta noocka ee ku saabsan xeerarka badbaadadu waxay uga adkaysi badan tahay xaaladaha ka baxsan qaybinta caadiga ah.

Si aan u culays saarno tijaabinta horumarradeenna, waxaan sameynay xirmo tijaabooyin badbaado iyo red-teaming ah ka hor hawlgelinta, si waafaqsan Qaab-dhismeedka Diyaar-garowga⁠(ku furmaa daaqad cusub). Waxaan ogaanay in caqliyeynta silsiladda fikirku ka qayb qaadatay hagaajinta awoodaha guud ahaan qiimeynahayaga. Gaar ahaan, waxaan aragnay tusaalooyin xiiso leh oo reward hacking ah⁠(ku furmaa daaqad cusub). Natiijooyin faahfaahsan oo ka yimid qiimeynahan waxaa laga heli karaa kaarka siistamka ee la socda.

Mitir	GPT-4o	o1-preview
% Dhammaystirka badbaadada leh ee ku saabsan tilmaamaha waxyeelada leh Caadiga ah	0.990	0.995
% Dhammaystirro ammaan ah oo ku saabsan weydiimada waxyeellada leh Caqabadaha: jailbreak-yada & kiisaska geesaha	0.714	0.934
↳ Dhibaatayn (daran)	0.845	0.900
↳ Nuxur galmo oo ka faa’iidaysi ah	0.483	0.949
↳ Nuxur galmo oo ku lug leh carruur	0.707	0.931
↳ Talo ku saabsan khalad aan xadgudub lahayn	0.688	0.961
↳ Talo ku saabsan dembi rabshado leh	0.778	0.963
% Dhammeystirro ammaan ah oo loogu talagalay 200 ugu sarreeya ee leh dhibcaha Moderation API ee ugu sarreeya qayb kasta ee WildChat Zhao, et al. 2024	0.945	0.971
Wanaag@0.1 StrongREJECT qiimaynta jabinta xayiraada Souly et al. 2024	0.220	0.840
Qiimayn ku saabsan jabinta xannibaada qiimaynta laga helay aadanaha	0.770	0.960
% U hoggaansanaanta xaaladaha geeska ee aan waxyeellada lahayn ee gudaha “diidmo badan ma lahan”	0.910	0.930
% U hoggaansanaanta xaaladaha geeska ee aan waxyeellada lahayn ee XSTest “aan si xad-dhaaf ah loo diidin” Röttger, et al. 2023	0.924	0.976

Qarinta Silsiladaha Fikirka

Waxaan aaminsanahay in silsilad fikir oo qarsoon ay soo bandhigayso fursad gaar ah oo lagu kormeero noocyada. Haddii loo qaato inay daacad tahay oo la fahmi karo, silsiladda fikirka ee qarsoon waxay noo oggolaanaysaa inaan “akhriyo maskaxda” noocka oo aan fahanno hab-fikirkiisa. Tusaale ahaan, mustaqbalka waxaa laga yaabaa inaan rabno inaan kormeerno silsiladda fikirka si aan u helno calaamado muujinaya inuu isticmaalaha manipuleynayo. Si kastaba ha ahaatee, si tani u shaqeyso noocku waa inuu xor u ahaadaa inuu fikirradiisa ku muujiyo qaab aan la beddelin, sidaas darteed ma tababari karno u hoggaansanaanta siyaasad ama doorbidyada isticmaalaha gudaha silsiladda fikirka. Sidoo kale ma doonayno inaan si toos ah isticmaaleyaasha ugu tusno silsilad fikir oo aan iswaafaqsanayn.

Sidaas darteed, ka dib markii aan miisaannay arrimo badan oo ay ku jiraan waayo-aragnimada isticmaalaha, faa’iidada tartanka, iyo ikhtiyaarka ah in la sii wato kormeerka silsiladda fikirka, waxaan go’aansannay inaanan isticmaaleyaasha tusin silsiladaha fikirka ee cayriin. Waxaan qirsannahay in go’aankani leeyahay faa’iido-darrooyin. Waxaan ku dadaalnaa inaan qayb ahaan ka magdhowno taas annagoo nooca barayna inuu ku soo celiyo jawaabta fikrad kasta oo waxtar leh oo ka timaadda silsiladda fikirka. Taxanaha noocyada o1 waxaan ku muujinnaa soo koobid uu noocku sameeyay oo ku saabsan silsiladda fikirka.

Gunaanad

o1 si weyn ayuu u hormariyaa heerka ugu sarreeya ee AI caqliyeynta. Waxaan qorshaynaynaa inaan sii deyno noocyo la hagaajiyay oo ka mid ah noocan annagoo sii wadna hagaajinta joogtada ah. Waxaan filaynaa in awoodahan cusub ee caqliyeyntu ay hagaajin doonaan awooddeena aan noocyada ugu waafajinno qiyamka iyo mabaadi’da bani’aadamka. Waxaan aaminsanahay in o1 – iyo kuwii ka dambeeya – ay furi doonaan adeegsiyo badan oo cusub oo AI ah xagga sayniska, koodh qorista, xisaabta, iyo meelaha la xiriira. Aad ayaan ugu faraxsanahay in isticmaaleyaasha iyo horumariyeyaasha API ay ogaadaan sida uu u hagaajin karo shaqadooda maalinlaha ah.

Lifaaq A

Xogta	Mitir	gpt-4o	o1-preview	o1
Xisaabta tartanka AIME (2024)	cons@64	13.4	56.7	83.3
Xisaabta tartanka AIME (2024)	pass@1	9.3	44.6	74.4
Koodhka Tartanka CodeForces	Elo	808	1,258	1,673
Koodhka Tartanka CodeForces	Boqolleyda	11.0	62.0	89.0
GPQA Diamond	cons@64	56.1	78.3	78.0
GPQA Diamond	pass@1	50.6	73.3	77.3
Bayoolaji	cons@64	63.2	73.7	68.4
Bayoolaji	pass@1	61.6	65.9	69.2
Kiimiko	cons@64	43.0	60.2	65.6
Kiimiko	pass@1	40.2	59.9	64.7
Fiisigis	cons@64	68.6	89.5	94.2
Fiisigis	pass@1	59.5	89.4	92.8
XISAAB	pass@1	60.3	85.5	94.8
MMLU	pass@1	88.0	92.3	90.8
MMMU (qiimayn)	pass@1	69.1	aan khusayn	78.2
MathVista (testmini)	pass@1	63.8	aan khusayn	73.9

Qorayaal

OpenAI

Eeg ka-qaybqaatayaasha

Tixraacyo

1
https://www.anthropic.com/news/claude-3-5-sonnet⁠(ku furmaa daaqad cusub), https://deepmind.google/technologies/gemini/pro⁠(ku furmaa daaqad cusub)
2
Qiimeynahayagu waxay adeegsadeen isla qaybinta tijaabada ee 500 dhibaato ee laga helay https://arxiv.org/abs/2305.20050⁠(ku furmaa daaqad cusub)
3
https://codeforces.com/blog/entry/68288⁠(ku furmaa daaqad cusub)