《那些古怪又让人忧心的问题》第99期:论Twitter的无穷性(1)

Tips:点击图片进入下一页或下一篇
文章目录

TWITTER

论Twitter的无穷性

Q. How many unique English tweets are possible? How long would it take for the population of the world to read them all out loud?

Q.世界上有多少独一无二的英语推文(Twitter状态)?如果全世界人民把它们都读出来要花多少时间?

——Eric H, Hopatcong, NJ

——埃里克•H

High up in the North in the land called Svithjod, there stands a rock. It is a hundred miles high and a hundred miles wide. Once every thousand years a little bird comes to this rock to sharpen its beak. When the rock has thus been worn away, then a single day of eternity will have gone by.

在遥远的北方有一个叫斯维斯约德(Svithjod)的地方,那里有一块大石头,它有100英里长,100英里高。每一千年都有一只小鸟来到这块巨石前,用石头磨砺自己的喙。当石头就这样被磨掉之后,永恒终才过了一天。

——Hendrik Willem Van Loon

——亨德里克•W.房龙

A. TWEETS ARE 140 CHARACTERS long. There are 26 letters in English-27 if you include spaces. Using that alphabet, there are 27140 ≈ 10200 possible strings.

A.推文只能有140个字符。而英语中有26个字母——如果你把空格也算进去的话是27个。如果利用这些字母,那么就有27140≈10200种可能的字符串。

But Twitter doesn 8217;t limit you to those characters. You have all of Unicode to play with, which has room for over a million different characters. The way Twitter counts Unicode characters is complicated, but the number of possible strings could be as high as 10800.

但是在推文中你不止可以使用这些字符,所有的Unicode字符你都可以使用,而这加起来有超过100万个不同的字符。Twitter里Unicode字符算多少字的算法很复杂,但可能的字符串个数仍然高达10800种。

Of course, almost all of them would be meaningless jumbles of characters from a dozen different languages. Even if you 8217;re limited to the 26 English letters, the strings would be full of meaningless jumbles like “ptikobj.” Eric 8217;s question was about tweets that actually say something in English. How many of those are possible?

当然了,这些字符串中大多数都是毫无意义的多语种混搭,即使你把可使用的字符限定在26个英语字母中,也是充斥着像“ptikobj”这样无意义的词。但埃里克提的问题是用英语表达一些有意义的内容,那么有多少种可能性呢?

This is a tough question. Your first impulse might be to allow only English words. Then you could further restrict it to grammatically valid sentences.

这个问题有点棘手。你的第一直觉大概是只允许使用英语里有的词。接下来你可能想把范围限制在合乎语法的句子里。

But it gets tricky. For example, “Hi, I 8217;m Mxyztplk” is a grammatically valid sentence if your name happens to be Mxyztplk. (Come to think of it, it 8217;s just as grammatically valid if you 8217;re lying.) Clearly, it doesn 8217;t make sense to count every string that starts with “Hi, I 8217;m . . . ” as a separate sentence. To a normal English speaker, “Hi, I 8217;m Mxyztplk” is basically indistinguishable from “Hi, I 8217;m Mxzkqklt,” and shouldn 8217;t both count. But “Hi, I 8217;m xPoKeFaNx” is definitely recognizably different from the first two, even though “xPoKeFaNx” isn 8217;t an English word by any stretch of the imagination.

但这里有陷阱。比如说,如果你的名字正好是Mxyztplk的话,“Hi,I 8217;m Mxyztplk”这句话在语法上就没问题。(说起来,就算你撒谎了,你的名字不是这个,这句话在语法上依然成立呀。)所以一个显然的问题就是,你不能把所有以“Hi,I 8217;m…”开头的字符串当作一个独立的句子。对于一个普通的说英语的人来说,“Hi,I 8217;m Mxyztplk”和“Hi,I 8217;m Mxzkqklt”简直没有任何区别,因而它们不能被重复计数。但是“Hi,I 8217;m xPoKeFaNx”这句话与之前那两句话是一眼就能看出不同的,哪怕“xPoKeFaNx”也无论如何不可能是一个英语单词。

Our way of measuring distinctiveness seems to be falling apart. Fortunately, there 8217;s a better approach.

所以我们用来衡量差异性的办法不管用了。所幸还有更好的办法。

Let 8217;s imagine a language that has only two valid sentences, and every tweet must be one of the two sentences. They are:

假设存在一种语言,它只有两个可用的句子,并且每条推文必须是这两个句子中的一句。这两个句子分别是:

“There 8217;s a horse in aisle five.”

•“5号通道有一匹马。”

“My house is full of traps.”

•“我的屋子里都是陷阱。”

标签:   发布日期:2024-02-29 05:32:00  投稿会员:Aucao