2023考研英語閱讀計算機模擬視覺
Computer vision
計算機模擬視覺
Eye robot
你是我的眼
Poor eyesight remains one of the main obstacles toletting robots loose among humans.
放手讓機器人在人類社會自由活動仍存在重大障礙它們看不清楚。
But it is improving, in part by aping natural vision.
然而人工視能正在逐漸提高,途徑之一就是模擬自然視覺。
ROBOTS are getting smarter and more agile all the time.
機器人的反應總是在變得越來越靈活,動作也越來越敏捷。
They disarm bombs, fly combat missions, put together complicated machines, even playfootball.
它們會拆卸炸彈、駕駛戰斗機執行任務、組裝復雜機械,甚至還會踢足球。
Why, then, one might ask, are they nowhere to be seen, beyond war zones, factories andtechnology fairs?
那么,人們不禁要問,為什么除了在戰場、工廠和科技產品展銷會,生活中都看不到機器人的蹤影呢?
One reason is that they themselves cannot see very well.
一個原因就是它們自己眼神不大好。
And people are understandably wary of purblind contraptions bumping into them willy-nilly inthe street or at home.
機器人跟睜眼瞎差不多,要是把它們弄到大街上,或者擺在家里,搞不好就沒頭沒腦地把人給撞了對這玩意兒謹慎一點也是人之常情。
All that a camera-equipped computersees is lots of picture elements, or pixels.
裝有攝像頭的計算機能看到的一切,僅僅是大量的圖像元素,又稱像素。
A pixel is merely a number reflecting how much light has hit a particular part of a sensor.
像素只不過是一個數值,反映照到傳感器某個部位的光線亮度是多少。
The challenge has been to devise algorithms that can interpret such numbers as scenescomposed of different objects in space.
困難在于,要編寫出一套計算程序,可以把這些數字再現為空間中不同物體構成的景象。
This comes naturally to people and, barring certain optical illusions, takes no time at all aswell as precious little conscious effort.
這一切對于人類來說,只是一種本能除非出現某些錯覺立桿而見影,在意識上可謂不費吹灰之力。
Yet emulating this feat in computers has proved tough.
然而事實證明,在計算機上模擬人的這一天賦實非易事。
In natural vision, after an image is formed in the retina it is sent to an area at the back of thebrain, called the visual cortex, for processing.
自然視覺的過程是:視網膜成像后,圖像被傳送到大腦后部叫做視覺皮層的地方,在那里進行信息處理。
The first nerve cells it passes through react only to simple stimuli, such as edges slanting atparticular angles.
圖像經過的第一組神經元只能對簡單的視覺刺激作出反射,例如物體朝某些角度傾斜的邊緣。
They fire up other cells, further into the visual cortex, which react to simple combinations ofedges, such as corners.
第一組神經元又將興奮傳給視覺皮層更深處的神經元,這些神經細胞可以對簡單的物體輪廓作出反應,例如空間中的角落。
Cells in each subsequent area discern ever more complex features, with those at the top ofthe hierarchy responding to general categories like animals and faces, and to entire scenescomprising assorted objects.
越往后,神經元能識別的圖像特征越復雜,最高級別神經區域可以對動物和臉等總體類別作出反應,最后將包羅萬象的場景整合到一起。
All this takes less than a tenth of a second.
而整個過程只需要不到十分之一秒。
The outline of this process has been known for years and in the late 1980s Yann LeCun, nowat New York University, pioneered an approach to computer vision that tries to mimic thehierarchical way the visual cortex is wired.
很早以前,人們就已經了解這一過程的大致情形。二十世紀80年代末,現就職于紐約大學的雅安?勒存率先涉足計算機視覺研究,試圖模擬人腦視覺皮層內神經元層層遞進的布線方式。
He has been tweaking his convolutional neural networks ever since.
從那時起,他就一直在調整和改良他的卷積神經網絡。
Seeing is believing
眼見為實
A ConvNet begins by swiping a number of software filters, each several pixels across, overthe image, pixel by pixel.
卷積神經網絡首先用幾個軟件濾光器,對圖像逐一像素地進行掃描,每個濾光器只能通過幾個像素。
Like the brain s primary visual cortex, these filters look for simple features such as edges.
就像大腦的初級視覺皮層,這些濾光器只負責收集物體邊緣等簡單圖像特征。
The upshot is a set of feature maps, one for each filter, showing which patches of theoriginal image contain the sought-after element.
結果得到一組特征圖,每一張特征圖對應一個濾光器,顯示出原始圖像中的哪些塊包含要篩選到的要素。
A series of transformations is then performed on each map in order to enhance it andimprove the contrast.
隨后,每一張特征圖都要進行一系列調整,以提高它的畫質、改善它的明暗對比度。
Next, the maps are swiped again, but this time rather than stopping at each pixel, the filtertakes a snapshot every few pixels.
接下來,對這些特征圖再次進行掃描,但這一次,濾光器不會在像素上逐一停留,而是每幾個像素快拍一次。
That produces a new set of maps of lower resolution.
這樣,得到一組新的分辨率較低的特征圖。
These highlight the salient features while reining in computing power.
這些步驟凸顯圖像最顯著的特征,同時對計算資源進行嚴格控制。
The whole process is then repeated, with several hundred filters probing for more elaborateshapes rather than just a few scouring for simple ones.
然后,將整個過程重復一遍,用幾百個濾光器探查更為精細的物體形狀,而不是隨便掃視一些簡單的形狀。
The resulting array of feature maps is run through one final set of filters.
由此得到的特征圖陣列,被輸送經過最后一組濾光器。
These classify objects into general categories, such as pedestrians or cars.
它們可以對物體進行大體歸類是行人還是汽車等等。
Many state-of-the-art computer-vision systems work along similar lines.
許多尖端計算機視覺模擬系統都采用類似的原理運行。
The uniqueness of ConvNets lies in where they get their filters.
卷積神經網絡的獨特之處在于它們的濾光器已經做得登峰造極。
Traditionally, these were simply plugged in one by one, in a laborious manual process thatrequired an expert human eye to tell the machine what features to look for, in future, at eachlevel.
以往,濾光器只是一個接一個地接通。這一過程由手工完成,極為繁瑣,需要一名專家全程用肉眼觀察,然后向機器下達指令,告訴它下一步檢索什么樣的特征。
That made systems which relied on them good at spotting narrow classes of objects but ineptat discerning anything else.
于是,依靠手動操作濾光器的計算機視覺系統,可以識別的物體類別十分有限,而無法分辨其他更多的東西。
Dr LeCun s artificial visual cortex, by contrast, lights on the appropriate filtersautomatically as it is taught to distinguish the different types of object.
相比之下,勒存博士的人工視覺皮層,可以在按照設定程序識別不同類型的物體時,自動接通相應的濾光器。
When an image is fed into the unprimed system and processed, the chances are it will not, atfirst, be assigned to the right category.
把一張圖像輸入他的系統進行處理,如果這個系統沒有預先存儲任何資料,第一次使用時體統有可能會把這張圖像錯誤歸類。
But, shown the correct answer, the system can work its way back, modifying its ownparameters so that the next time it sees a similar image it will respond appropriately.
但是,告訴它正確答案之后,系統將重新識別圖像,并修改自身的參數,以便下一次再看到類似的圖像,可以做出恰當的回應。
After enough trial runs, typically 10,000 or more, it makes a decent fist of recognising thatclass of objects in unlabelled images.
經過足夠的試運行之后通常需要進行1萬次以上要在未經標示的圖像上識別那一類物體,卷積神經網絡可以完成得相當出色。
This still requires human input, though.
然而,這個階段還是需要人類對其進行信息輸入。
The next stage is unsupervised learning, in which instruction is entirely absent.
下一個階段為無監督學習,在這個過程中沒有任何人的指導。
Instead, the system is shown lots of pictures without being told what they depict.
是的,向勒存的計算機視覺系統展示大量圖片,不告訴系統圖上畫的是什么。
It knows it is on to a promising filter when the output image resembles the input.
如果輸出的圖像和輸入的圖像幾乎一樣,系統就知道自身的濾光器升級了。
In a computing sense, resemblance is gauged by the extent to which the input image canbe recreated from the lower-resolution output.
在計算機學上,兩張圖片是否相像的判斷標準是,像素較低的輸出圖像可以在多大程度上復原為輸入的圖像。
When it can, the filters the system had used to get there are retained.
一旦可以還原,為系統所用而達到這種效果的濾光器就被保留下來。
In a tribute to nature s nous, the lowest-level filters arrived at in this unaided process areedge-seeking ones, just as in the brain.
在這個體現自然界智能的過程中,在無人輔助階段,濾光器達到的最初等級為物體邊緣搜索,正如人腦中的情形一樣。
The top-level filters are sensitive to all manner of complex shapes.
最高等級的濾光器各種光怪陸離的形狀都十分敏感。
Caltech-101, a database routinely used for vision research, consists of some 10,000standardised images of 101 types of just such complex shapes, including faces, cars andwatches.
加州理工101是進行視覺研究常規使用的數據庫,它存儲了約1萬幅標準化圖像,描述101類和標準化圖像復雜程度相當的物體形狀,包括臉、汽車和手表等。
When a ConvNet with unsupervised pre-training is shown the images from this database itcan learn to recognise the categories more than 70% of the time.
當給事先經過無人監督訓練的卷積神經網絡展示這個數據庫中的圖像時,它可以通過學習辨認圖像的類別,成功幾率超過70%。
This is just below what top-scoring hand-engineered systems are capable ofand thosetend to be much slower.
而最先進的手動視覺系統可以做到的也只比這個高一點點并且它們的辨認速度往往慢得多。
This approach which Geoffrey Hinton of the University of Toronto, a doyen of the field, hasdubbed deep learning need not be confined to computer-vision.
勒存的方法多倫多大學的杰弗里?希爾頓是該領域的泰斗,他將這一方法命名為深度學習不一定局限于計算機視覺領域。
In theory, it ought to work for any hierarchical system:language processing, for example.
理論上,該方法還可以用在任何等級系統當中,譬如語言處理。
In that case individual sounds would be low-level features akin to edges, whereas themeanings of conversations would correspond to elaborate scenes.
在這種情況下,音素就是語言識別的初級特征,相當于模擬視覺中的物體邊緣,而對話的含義則相當于復雜場景。
For now, though, ConvNet has proved its mettle in the visual domain.
然而,目前卷積神經網絡已經在視覺領域大顯神威。
Google has been using it to blot out faces and licence plates in its Streetview application.
谷歌一直在街道實景應用程序中使用該系統,識別人臉和車牌,對其進行模糊處理。
It has also come to the attention of DARPA, the research arm of America s DefenceDepartment.
它還引起了美國國防部高等研究計劃局的注意。
This agency provided Dr LeCun and his team with a small roving robot which, equipped withtheir system, learned to detect large obstacles from afar and correct its path accordinglyaproblem that lesser machines often, as it were, trip over.
他們為勒存博士和他的團隊提供了一個漫游機器人,給它裝上卷積神經網絡系統后,這個機器人學會了探測遠處的大型障礙物,并相應地糾正行進路線可以說,沒有安裝該系統的機器人通常都會在這個問題上絆住。
The scooter-sized robot was also rather good at not running into the researchers.
這個漫游機器人只有小孩玩的滑板車那么大,卻還相當有眼色:它不會撞向研究人員。
In a selfless act of scientific bravery, they strode confidently in front of it as it rode towardsthem at a brisk walking pace, only to see it stop in its tracks and reverse.
研究人員們發揚科學家舍身忘我的大無畏精神,做了一個實驗:當機器人步履輕盈地向他們開過來時,他們突然昂首闊步迎面沖向機器人。結果發現,機器人半路停下并轉向。
Such machines may not quite yet be ready to walk the streets alongside people, but the daythey can is surely not far off.
當然,這類機器人要走上街頭與人為伍,或許還略欠火候。但是,它們可以自由行走那一天想必已經不遠了。