In the summer of 1958, a group of reporters are gathered by the Office of Naval Research in Washington, D.C., for a demonstration by a 29 year old researcher at the Cornell Aeronautical Laboratory named Frank Rosenblatt. Rosenblatt has built something he calls the "perceptron", and in front of the assembled press corps he shows them what it can do.
Rosenblatt has a deck of flashcards, each of which has a colored square on it, either on the left side of the card or on the right. He pulls one card out of the card and places it in front of the perceptron's camera. The perceptron takes it in as a black-and-white, 20-by-20 pixel image, and each of those four hundred pixels is turned into a binary number: 0 or 1, dark or light. The four hundred numbers, in turn, are fed into a rudimentary neural network, the kind that McCulloch and Pitts had imagined in the early 1940s. Each of these binary pixel values is multiplied by an individual negative or positive "weight", and then they are all added together. If the total is negative, it will output a -1(meaning the square is on the left), and if it's positive, it will output a 1(meaning the square is on the right).
The perceptron's four hundred weights are initially random, and its outputs, as a result, are nonsense. But every time the system guesses "wrong", Rosenblatt "trains" it, by dialing up the weights that were too low and turning down the weights that were too high.
Fifty of these trials later, the machine now consistently tells left-side cards and right-side cards apart, including ones he hasn't shown it before.
The demonstration itself is strikingly modest, but it signifies something grander. The machine is, in effect, learning from experience, what Rosenblatt calls a "self-induced change in the wiring diagram".
McCulloch and Pitts had imagined the neuron as a simple unit of input and output, of logic and arithmetic, and they had shown the enormous power of such rudimentary mechanisms, in great enough numbers and suitably connected. But they had said next to nothing about how exactly the "suitably connected" part was actually meant to be achieved.
"Rosenblatt made a very strong claim, which at first I didn't believe", says MIT's Marvin Minsky, coincidentally a former classmate of Rosenblatt's at the Bronx High School of Science. "He said that if a perceptron was physically capable of being wired up to recognize something, then there would be a procedure for changing its responses so that eventually it would learn to carry out the recognition. Rosenblatt's conjecture turned out to be mathematically correct, in fact. I have a tremendous admiration for Rosenblatt for guessing this theorem, since it's very hard to prove".
The perceptron, simple as it is, forms the blueprint for much of the machine-learning systems we will go on to discuss. It contains a model architecture : in this case, a single artificial "neuron" with four hundred inputs, each with own "weight" multiplier, which are then summed together and turned into an all-or-nothing output. The architecture has a number of adjustable variables, or parameters : in this case, the positive or negative multipliers attached to each input. There is a set of training data: in this case, a deck of flash cards with one or two types of shapes on them. The model's parameters are tuned using an optimization algorithm, or training algorithm.
The basic training procedure for the preceptron, as well as its many contemporary progency, has a technical-sounding name -- "stochastic gradient descent" -- but the principle is utterly straightforward. Pick one of the training data at random("stochastic") and input it to the model. If the output is exactly what you want, do nothing. If there is a difference between what you wanted and what you got, the figure out in which direction ("gradient") to adjust each weight -- whether by literal turning of physical knobs or simply the changing of numbers in software --to lower the error for this particular example. Move each of them a little bit in the appropriate direction ("descent"). Pick a new example at random and start again. Repeat as many times as necessary.
This is the basic recipe for the field of machine learning --and the humble perceptron will be both an overestimation and an underestimation of what is to come.
"The Navy", reports the New York times, "revealed the embryo of an electric computer today that it expects will be able to walk, talk, see, write, reproduce itself and be concious of its existence."
The New Yorker writes that the perceptron, "as its name implies, is capable of original thought." "Indeed", they write, "it strikes us as the first serious rival to the human brain ever devised."
Says Rosenblatt to the New Yorker reporter, "Our success in developing the perceptron means that for the first time a non-biological object will achieve an organization of its external environment in a meaningful way. That's a safe definition of what the perceptron can do. My colleague disapproves of all the loose talk one hears nowadays about mechanical brains. He prefers to call our machine a self-organizing system, but, between you and me, that's what precisely any brain is.
That same year, New Scientist publishes an equally hopeful, and slightly more sober, article called "Machines Which Learn." "When machines are required to perform complicated tasks it would often be useful to incorporate devices whose precise mode of operation is not specified initially," they write, "but which learn from experience how to do what is required. It would then be possible to produce machines to do jobs which have not been fully analysed because of their complexity. It seems likely that learning machines will play a part in such projects as the mechanical translation of languages and the automatic recognition of speech and of visual patterns."
"The use of the term 'learning machine' invites comparison with the learning of people and animals," the article continues. "The drawing of analogies between brains and machines requires caution to say the least, but in general way it is stimulating for workers in either field to know something of what is happening in the other, and it is possible that speculation about machines which learn may eventually produce a system which is a true analogue of some form of biological learning."
Everyone knows that artificial intelligence has gone through periods of great hope and deep disappointment. And the amazing future that the perceptron seemed to promise has taken a very long time to happen.
A few years later, Rosenblatt realized the press should have been more careful with their reporting. He thought they were too enthusiastic and didn't show enough restraint. He also acknowledged that his initial reports lacked strong mathematical proof.
Minsky, despite his "tremendous admiration" for Rosenblatt and his machine, begins "to worry about what such a machine could not do." In 1969, he and his MIT colleague Seymour Papert publish a book called Perceptrons that effectively slams the door shut on the entire vein of research. Minsky and Papert show, with the stiff formality of mathematical proof, that there are seemingly basic patterns that Rosenblatt's model simply will never be able to recognize. For instance, it is impossible to train one of Rosenblatt's machines to recognize when a card has an odd versus an even number of squares on it. The only way to recognize more complex categories like this is to use a network with multiple layers, with earlier layers creating a reperesntation of the raw data, and the later layers operating on the representation. But no one knows how to tune the parameters of the early layers to make representations useful for the later ones. The field hits the equivalent of a brick wall. "There had been several thousand papers published on perceptrons up to 1969", says mInsky.
" Our book put a stop to those."
It is as if a dark cloud has settled over the field, and everything falls apart: the research, the money, the people. Pitts, McCulloch, and Lettvin, who have all three moved to MIT, are sharply exiled after a misunderstanding with MIT's Norbert Wiener, who had been like a second father figure to Pitts and now won't speak to him. Pitts, alcoholic and depressed, throws all of his notes and papers into a fire, including an unpublished dissertation about three-dimensional neural networks that MIT tries desperately to salvage. Pitts dies from cirrhosis in May 1969, at the age of 46. A few months later Warren McCulloch, at the age of 70, succumbs to a heart seizure after a long series of cardiopulmonary problems. In 1971, while celebrating his 43rd birthday, Frank Rosenblatt drowns in a sailing accident on the Chesapeake Bay.
By 1973, both the US and British governments have pulled their funding support for neaural netwrok reserach, and when a young English psychology student named Geoffrey Hinton declares that he wants to do his doctoral work on neural networks, again and again he is met with the same reply : "Minsky and Papert", he is told, "have proved that these models were no good."
Sourced and made with love , from the captivating article "THE ALIGNMENT PROBLEM, Machine Learning and human values" by Brian Christian.
Enjoy!