These answers were written by Alan Kay on Quora.

Did Alan Kay know Bill Atkinson? Was there any cross-fertilization between SmallTalk and HyperCard/Talk?

I first met Bill at the famous Steve Jobs visit to Parc to see the Smalltalk system, its GUI, etc. (I might have run into him earlier). After a stint at Atari, Steve Jobs hired me in 1984 as an Apple Fellow (Butler and I had been made Fellows towards the end of our stay at Xerox). Bill had been made an Apple Fellow for his many contributions, especially to the Lisa and Macintosh projects. He was very talented and creative, etc. and our relationship could be characterized as “brotherly”.

Fellows in most organizations are both autonomous and have access to a research stipend that allows research projects of a few people to be started without needing signoff from management.

Bill liked the “Dynabook” idea and his slant on it was that what you saw on the screen was a particular “card” of an unlimited number that you could find, link to, etc. He called the machine “Magic Slate” and the SW part “Wild Card”. He did a subset of this as a Mac “accessory” called “Rolodex”, and I thought it was the neatest greatest little thing I’d seen on a personal computer. Part of Bill’s genius was in “small things done wonderfully”, and this was an example.

He and his little group made a larger vesion with more capabilities (roughly what came to be known as “Hypercard level 4”) — basically most things but leaving out scripting.

One day Bill came to me in tears. He had just been turned down by Apple Product when he tried to get them to develop and ship “Wild Card” on Macs.

I was friendly with John Sculley, and said let’s get John to see this. John liked it and then said after finding out about the turn-down, “I’ll push this through”. (This made me realize that Apple was in a bit of trouble: a CEO should never have to be the Champion of a product in a company that is running correctly.)

I said to John, when you do this, make sure that this can be scripted: this is the main thing it lacks right now.

I realized that I didn’t have the particular “end-user sensibility” that Bill had, and decided not to butt in regarding the design of the scripting (and I was right: I would have ruined the naive charm of Hypertalk). They found Dan Winkler to be the main designer of the scripting part, and he and his team did a great job that didn’t break the charm and feel of the original idea.

Another key player was Carol Kaehler, who designed much of the surrounding “user amenities” and did most of the “Help Stack”.

Because Sculley was driving it, the development effort was very well funded and was the best overall example I’d seen of developing a product from a research prototype.

Hypercard (there was a trademark conflict with the original name) was not just an instant success, but in a few years Apple was able to identify at least 4 million creators of new HyperApps — making it the most successful programming system in history.

I regarded it as a breakthrough for the “persons” who used “personal computers” in that it gave the first glimpse of the many things that needed to not just be done well, but done well in the psychic space of actual end-users.

Smalltalk had very little influence on Hypercard — besides its influence on the Mac — and vice versa. However, Hypercard had a great influence on the Etoys system our group did later at Disney, which was an attempt to make a system for children that combined what was great about Hypercard, Smalltalk, Star Logo, etc., (and to eliminate what was not so good for end-users). This wound up working really well, and it would not have been as good without Hypercard having been done.

Apple the company — and Steve, the returned “saviour” for Apple — did not understand the Internet, and that Hypercard was a fantastic model for a truly great browser. Instead, it was brushed aside and much more mundane ideas — pretty much everywhere in the general business — ruled to the eventual detriment of most end-users today.

What role did the unique features of the Alto computer play in the development of programming languages like Smalltalk and Lisp?

I realized that I was being unjustifiably harsh in my first paragraph (now the 2nd) below. It is actually a more reasonable question than as I initially interpreted it. The unique features of the Alto did play a part in the development of Smalltalk in part because the special features were put in the Alto because we knew they would help in our (and the rest of Parc’s) VHLL research efforts. In sum: (a) we knew “sort of” how things were going to go, but did not yet have the details that would make a pure HW design possible, and (b) we needed to be able to make at least 100 machines, and a pure HW version, even if we knew all the answers would be too expensive per unit. One of my slogans back then was “HW is just SW crystallized early”, and the ratios of costs and capacities indicated that a “meta-machine” using Wilkes’ invention of microcode — and making it dynamically loadable — would be a very good tradeoff. This approach was not controversial in the Parc research community, even though it remained so outside.

---

The simplest and most important part of the answer is that the question is exactly backwards — in the sense that the Alto was explicitly designed to be a good foundation for Very High Level Languages such as Lisp (implemented ca 1959) and Smalltalk (the earliest working version predated the Alto).

This should make some sense when contemplated. The only reason for a programmable computer is to run programs, and being able to invent HLLs and VHLLs to make it easier to write programs would greatly facilitate the whole process. The problem was that things had started out with HW that programmers were supposed to then use. It was most generally the case that powerful ideas in software organization very often didn’t find the features in existing hardware, and emulating them in SW sometimes resulted in too inefficient results.

A real forcing function was provided by Algol and a solution was the Burroughs B5000 (main designer Bob Barton) which was hardware that matched up to what was important to the language (and also to what was being learned about scaling and complex systems). I have written other Quora answers about this machine.

Even newer ideas again put a strain on the B5000, and it was realized by the mid-60s or so that a much better approach would be to make a “meta-machine” where the invention of microcode by Maurice Wilkes could be exploited. The basic approach here was to exploit the difference in speeds between core memory RAM and faster more expensive ways to make smaller memories and simple processors. The idea to visualize is that if the inner machine was always waiting on core to cycle then what it was doing couldn’t be done faster with straight logic, and therefore you were emulating logic without losing overall throughput.

So you not only could simulate any architecture, but — if the microcode memory was dynamically loadable — you could incorporate new understandings of process description simply by writing new microcode, and you would — in effect — be creating the “HW” on the fly that your new SW ideas required.

The Alto didn’t originate these ideas, but it was one of the earliest to put all of the elements together in a practical fashion. Part of the reason we could build so many Altos, was that the microcode could also emulate what was normally HW, and thus reduce the overall costs of building many machines (we wound up building about 2000 of them).

So the process was to think about advances in language semantics, ask questions about representations, and then create the machine architecture well suited to these via microcoding. All the VHLLs we used at Parc had their own custom microcode that defined the kernal “HW” in the form of a SW virtual machine. When you started up e.g. Lisp or Smalltalk, what got loaded first was the microcode that defined the particular virtual machine, etc.

All the computers we built at Parc used this idea. This meant that only a few thousand lines of code were needed to allow any of these machines to run any of the SW environments we were experimenting with.

How much did Daniel Bobrow and Alan Kay interact at PARC in the days that led to the invention of modern Object-Oriented Programming? Is Bobrow and Winograd's "An Overview of KRL" (1976) the first publication using OO in its modern meaning?

Danny and I were friends from the time I first met him (he a researcher from MIT now at BBN) and I was a first year grad student (this was probably in 1967).

He was a very friendly guy and was both interested in AI (meaning AGI back then) and programming language ideas — he was one of many such ARPA researchers who bounced ideas off each other. We were all overjoyed a few years later when Parc started up and he joined the three ring circus there.

We continued our friendship, tennis competitions, and talk about language architectures. Danny was a very helpful editor and critic on my first public Dynabook paper (ACM Nat’l Convention 1972).

One of the most interesting and best uses of Smalltalk at Parc was as the implementation vehicle for the PIE system by Danny Bobrow and Ira Goldstein. This advanced several concepts of OO and of temporal tagging, etc.

I don’t know what the first publication to use OO in its modern meaning (I’m a bit afraid to ask what this might mean), but it would have happened way before 1976.

Has anyone without a professional background in engineering, science, mathematics, etc. ever invented something solely based on a great idea?

There are already lots of good answers to this, so I’ll just add a larger observation made by the philosopher/mathematician Albert North Whitehead: “The main invention of the 19th century was Invention itself”.

He pointed to the immense boom in “letters patent” in Great Britain to show that almost everyone seemed to be inventing: as though it was in the air. It was a topic of conversation at parties, etc.

The detailed history of how a few key inventions that started the industrial revolution also made it “a thing” to invent something is too long for Quora. But you can easily find books on many actually crazy inventions that were patented: one that comes to mind is a hat that would tip itself to a lady if you moved your head correctly!

How long have you been designing a computer system?

I started “coding as a real job” in the US Air Force in 1962. In those days “coding” and “programming” had rather different meanings. “Coding” was essentially being a human compiler to turn designs (mostly in terms of flow charts) into working machine code by way of writing in an assembly language. Some of the assemblers we used — such as IBM Autocoder — had fairly powerful abilities to construct macros, so design started to creep into one’s coding skills.

“Programming” included the design of algorithms and systems.

Most coders started doing some design soon after writing their first programs. One of the first real systems designs I did was a complex scheme for merge sorting with very slow tape drives, which was likely done around the end of 1962 or beginning of 1963.

So the answer is about 63 years …

I have heard so many amazing stories of PARC where innovation was driven by the creativity and curiosity of the developers. Does this type of scenario exist anywhere in today's software industry?

In the early ‘80s Regis McKenna, one of the shapers of the Silicon Valley culture, clarified terms by pointing out that “invention” was what places like Xerox Parc did, and “innovation” was what companies like Apple did. The latter takes much more work and funding than the former, but doesn’t help anyone without a hefty admixture of “invention”.

Despite the smaller costs of real invention — and the incredible ROIs from the successes — most of the best funding diminished to below critical masses in the ’80s and is still quite non-existent (as far as I can tell). For example, just the profits from the invention of laser printing at Parc paid for the entire cost of Parc over the first 10 or so years (its most productive period) by roughly a factor of 200. The entire output of Parc constituted the invention of an entire new industry.

Another nitpick with terminology: it would be most appropriate to call the people at Xerox Parc “researchers” rather than “developers”.

I am aware of a number of “working group” sized organizations today whose process and level of researchers — both creatively and technically — are comparable to Parc. In the not too distant past I was part of one of these (Viewpoints Research Institute). Two of several really interesting ones today are Ink And Switch (James Lindenbaum) and DynamicLand (by Bret Victor). They are both “vision-based” and have been willing and able to “problem find” rather than to just try to make something that can be sold.

They are doing good things — and real research — but my experience whispers that they need to be part of a larger research institution to be as effective as e.g. Parc.

Parc was an organization “the next size larger” so it had quite a few “working group” sized activities. Looking back, I think this was critical — and critical even beyond the imagination of Bob Taylor, who did so much to shape the culture at Parc. We found out after the Parc diaspora in the 80s when the views of Xerox and of the researchers diverged to the point of break up. The “working groups” found homes in the 80s, but the synergies from the larger situation at Parc no longer provided the “secret sauce” that made it really special. (Something similar happened — on an even larger scale — during the “divestiture period” of ATT which eventually dissolved Bell Labs.)

Interesting facts are that the cost (and headcount) of Xerox Parc were still quite small, but there was definitely a “Goldilocks” factor in what actually constituted a critical mass of people and ideas. The nature of the funding (and funders) that made the ARPA/Parc research community possible no longer exists anywhere in the world as far as I can tell.

Could you critique my short essay about “the computer as a medium” (Amiga → Smalltalk) and tell me what doesn’t hold up?

I don’t have a link to this essay — or did I miss something?

---

Added 12 Feb 2026

Please forgive me in advance for only working from a machine translation of your article.

The first thing to keep in mind for anyone attempting an historical account is that the complex past at the very least needs to be thought about in terms of parallel, often intersecting, threads, whereas narrative (writing) is a single dimensional medium. This enormous difference has to be addressed in one way or another.

For example, most histories don’t reflect the parallel threads by supplying a simple 2+D map showing the main threads (the “+” is because the threads may have to overlap in some cases). A map would help this article.

Also extreme care in clarity in the early parts will help greatly.

For example, your title (translated to English) is

“When the computer was a medium - From Amiga to Smalltalk”

This seems to be an historical statement, but it misleads — it is actually your personal journey — which started with your encounter with a computer created in the 80s and then wound up with meeting a software system invented about 15 years earlier.

I would suggest that the second part be changed to — from the Amiga back to Smalltalk. This more accurately reflects what you describe in your article, and also helps the reader.

In fact, part of what you like about the Amiga is the overlapping windows and pointing UI, and the theory behind the look and feel, and these were derived from the Parc GUI which originated in Smalltalk.

Most histories of events to which I was actually present are not very accurate — with inaccuracies both small and large. Nitpicking these errors made by well-meaning authors is a bit beside the point if the larger article conveys an important idea. I think you succeed in the large here, and that the article makes a positive contribution.

But, since you asked me to critique it: just for the record …

My thoughts about media mostly came in the 196os from grappling with understanding Marshall McLuhan (who had many of the most important insights about media). My little addition was to notice that — since the computer could simulate anything that could be described — it was a true metamedium, and that meant there were likely media on the computer that could only exist because of the computer.

I think I would point out that the Alto at Parc could do excellent versions of real-time: graphics, animation, sound, music, multitasking, and programming coexisting together. My much later comment on the Amiga was because I thought it was the first and most successful consumer level project in the 80s to be able to do what our research machine in the 70s had pioneered.

Atari could have made something of its early investment/loan in the Amiga project (IIRC I think it was about $500K) when Atari got in trouble and needed new product to bail out.

By the way “Ted Hoff” was the inventor of the first onchip CPU (the Intel 4004). he and I were good friends, and shared an office while at Atari.

“The conceptual conection” between Smalltalk and the Amiga can be traced back to the 1950s, and especially to the ARPA computer research community. There were many important systems that had the same feel that predated Smalltalk.

Many of the things that made Smalltalk feel special were due to my partner in crime at Parc: Dan Ingalls. What you like about Smalltalk was due to an entire group happily working together.

I think this is more than you asked for, and almost certainly more than most readers would wish.

Best wishes

Alan

If you brought back Alan Turing, John von Neumann, and Claude Shannon today and gave them 10 years to learn modern science, would they still revolutionize AI and cloud computing?

Genius is genius in any age. Effective genius has to do with raw capacity, inner sense of style, and the era in which they live.

One answer to the question is that all three of these would be considered geniuses today. They would not necessarily be as effective — they might be less, or even more so — in today’s situations.

I’m answering like this because there are super high talents in every age, but there’s a sense in which the age singles out the talents that will be most effective for that age. I’m thinking of many situations in the 60s when the scarcity of resources combined with high ideals “found” researchers who could deal brilliantly with the combination. Our age would no doubt single out researchers with a feel for large scales.

In many cases — some would be as “effectively genius” in any age (I’m thinking of someone like Butler Lampson …).

How do programmers reason about code?

If we look at typical samples of code, then we would be justified in concluding an answer similar to this: “very differently, and not very well in general” …

What do you think of Rob Pike's assertion that "data structures, not algorithms, are central to programming"?

Looking at some of the other answers will be worthwhile. (Alon Amit provides more context.)

Before I try to answer this, let me ask those who are willing to comment what they see as the immense deadly flaw in this perspective? (Please comment rather than upvoting at this point!)

---

Added Jan 20th 2026 — OK. I mentioned before that I don’t like to write criticisms because they seem to morally require much more substantive detail than short praise. Here, we have a name attached to a question, but we don’t really know the context (or even whether this is what that person actually said). So I propose here to look at the Amit version of this as an isolated utterance, and react to it.

---

Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.

If we are talking about computing in general, this is an example of “category confusion” — likely with simple marking media such as paper, and not with computers at all. However, we could also say that the category confusion is deeper than that, and is with programming or even meaning itself.

We can also squint at this from another angle with the observation that in much of computer over history *pragmatics have dominated semantics (including often obliterating the latter).*

Or we can note that we have nothing — for markings — with humans or computers — if we don’t have process …

If we take a simple marking — such as sine(x) — we have a certain meaningful mathematical relationship in mind that our representations and uses of in computing want to not lose touch with. If we have an “x” in mind, then we want whatever this is to give us a number representation that is close enough to the mathematical meaning for our purposes and intentions. Note that in a computer language it doesn’t matter whether this is always internally represented by a pattern of bits: there is a process (generated by a program) whether we think of this as “data” (it could be a table), or “program” (there are programs that can generate approximations of sine(x)). It could be a mixture (an array for integer degrees and a program to compute the rest). In any case, “data” is not fundamental.

In a more “data-base-case”, let’s consider the query (notice that “sine(x)” could be considered a query) date-of-birth(Bob). Here we might be using this “base” as a way to store and retrieve properties of things. The category confusion here (which was swept under the rug of pragmatics) is that “age-of” is also a property of a thing. In both cases we are semantically dealing with a property and we expect a number of some kind as the functional association with the property and the thing/person mentioned.

Note that not only do we need “age-of(Bob)” to be computed at the time of the query (from date-of-birth(Bob) and current-time, but we should ask the question of whether we want the “retrieval” to be a mere number. The mere number will eventually lose its meaning.

Maybe the result of the “retrieval” should be a process that will always try to mean “age-of(Bob)”. Note that this could be considered a “type” (and it would be really valuable to have such types) but that this is far beyond most conceptions of types after 80 years of programming computers.

Are we going to (a) ignore this? (b) have special purpose patches of code? (c) or meet it straight on and see that “data as organization of bits” is too weak an idea to survive the earliest historical needs for computing.

Intel might not yet understand this — etc — but computing needs to. The Alto at Parc didn’t have floating point hardware. Smalltalk at Parc did have many kinds of numbers and would — behind the scenes — compute the meaning (and convert automatically) when necessary (if a primitive failed to be found in the HW). This was because “humans think first in terms of magnitudes” (and we can deal with the many ways of representing and computing these as mostly invisible optimizations).

A key idea here is: we are always computing even when we just might think we are “using” a “data structure” — something has to run according to a scheme in order to “do” or “use” anything. Once this is grasped, then it can be turned to our great advantage to start recapturing semantics in ways that “mere marking media” can’t (note what “meaning” implies if a human comes upon a marked up paper… — “something has to run according to a scheme …”).

How about as a slogan: *The center of programming is meaning*** ?

---

I generally write too long for Quora, so I’ll pause here. There is much more that can be said in large terms about “conceptions of what computing is, especially within the field — and how this can impede or aid progress in both thinking and tools”.

This is one of those “Point of view is worth 80 IQ points” issues (and where the sign bit can be “+” or “-”.

Mr. Alan Kay. You’ve said Smalltalk was about messages, not objects, and about creating a medium for thought. Do you think modern software has lost that goal—and is it still possible today to build systems that truly support human thinking?

This is several questions—which don’t have a strong overlap.

The “messages” part — and for programming techniques in general — can largely be summed up by the (oversimplification): There has been no corrresponding “Moore’s Law” for software development and systems. We have seen factors of billions (even trillions) of scale in capacity and elementary computing speeds. By contrast, not much has happened—especially with regard to engineering considerations—since the 1980s. (I am struggling to avoid a too long screed here!)

Much can be achieved by trying to discard present views and simply look at the engineering demands (and dangers to human life) brought by scalings. A thought as simple as “Anyone can make a doghouse which will not fall in on the dog, but trying to scale most methods by only a factor of 100 will be very dangerous—even fatal—for the dog” can help start thinking off in a much better direction.

---

By its nature making things with materials is also a medium for thought. The problem is that the qualifying adjective is missing. Since “most ideas are mediocre down to bad” just making is not likely to produce thoughts that help produce better worlds (and usually the opposite!).

The ideals of the 60s were to shape things so that human thought was qualitatively improved in important ways. I still think that it is possible, but it is very likely that—as Montessori pointed out at the beginning of the 20th century—most humans aren’t able to learn “epistemology in a classroom” (most learn their outlook on knowledge, how to find and vet it, whether to believe or test, etc. from growing up in a culture with its outlook manifested in the culture). Montessori’s insight was that therefore the form of schools—not so much its courses, etc.—had to be a culture that embodied the best ways to think, and presented to the children as a culture (not a syllabus).

In a school with this kind of thinking—that has chosen science as one of its facets for outlook—computers can really help if used very very carefully. This has many parallels to the pitfalls of trying to use books to help children learn science. First and foremost, science is about the relationship (and negotiations) between the external world (which produces phenomena) and our internal worlds (which assign meanings and beliefs to what we think the phenoman signifies). We can understand why it is so difficult to learn real science (and why it took so long to even discover it) by contemplating just how close to solipsism are our brain/minds, and why there are so many notions about us and the external world.

My favorite Einstein quote (to the Prussian Academy of Sciences, January 27, 1921) was: " ... as far as the propositions of mathematics refer to reality, they are not certain; and as far as they are certain, they do not refer to reality."

(He said this because he realized that even scientists frequently drift into confusion about how representations/attempts at map-making, etc., relate to primary sources of phenomena.)

What goes on in computers—like mathematics—is also hermetic, and thus is not a great basis for internalizing the outlook of science. As with trying to teach music from a book without having the learners sing and play or even hear sounds, this quite misses what things are actually about.

The reality of what’s needed puts a great burden on most families and most public schooling. It involves a lot more work by everyone concerned. This has not happened.

To return to the book as an analogy. Some books will inspire some children (and some parents and some teachers) to start the physical and mental actions that will help them take on the outlooks of the most important human creations: not just science, but music, and other arts; really difficult to handle “softer but critical sciences” (such as anthopology, etc.). “The softer the science the tougher the scientist has to be!”

In long ago talks I used to point out that “The music is not in the piano”. Acting as though it is misses the actual Art form.

Can you come up with a parallel saying for the computer? “The ____ is not in the computer!”

??

---

Added Jan 11th 2026

Quite a few commenters responded with “parallel sayings” from the above template.

I should explain that I put this in to highlight a difference between a real teacher and a “question answerer” (I am much more of the latter than the former). My friend Neil Postman (RIP) when a grad student used to follow McLuhan around from talk to talk, and noted that Marshall never responded directly to a question or a criticism, but instead said “How about this one?” and would come forth with another paradoxial Zen-Koan-like aphorism. Neil said “I finally realized that McLuhan cared most about whether the audience was thinking at all — and was thus doing what was most important about a great teacher helping students”.

I like to answer questions — as well as I’m able — but almost never take the time and care to try “real teaching” — it’s not clear that it fits into the Quora style and expectations. This experiment was interesting, and thanks to all who responded.

How was the design process in Xerox Park? How does Alan Kay's process look in his creative work? What does he think about the current design processes?

First a simple answer to the question, and then to why the question needs to be changed.

Ans: The design process at Parc was basically systems-oriented and derived from best practices, mostly from the ARPA community in the 60s. The most important addition to this was that the Parc researchers agreed amongst themselves to “engineer every invention to be made and usable by at least 100 users. I.e. the Ethernet had to support 100 devices, the personal computer we made had to be manifest as at least 100 machines (actually it grew to about 2000), SW systems had to be supported and documented for at least 100 users (so outside the research group), etc. Another feature common to all the projects at Parc was “never a fatal error”, meaning not losing work regardless of where failures occured (this was done pretty well via many different solutions).

---

Marshall McLuhan famously said “Don’t ask whether something is Good or Bad, Right or Wrong, True or False—try to Find Out What Is Going On!”.

He meant that judgments are what humans like to make, but they forget that they are always relative to contexts, many of which are invisible to the point of being thought of as “normal reality” rather than arbitrary beliefs.

For example, the consequence of most bugs at Xerox Parc and earlier was not much more than daily annoyances—today the consequences of too many bugs are human deaths (e.g. CrowdStrike’s victims were not just millions of computers for days, but about 2500 hospitals with emergency rooms and IRs, the 737 Max Autopilot, etc.).

The context has changed since the great research decades of the 60s and 70s. In extremis, we could say that the main theme in computing research back then was helping humans think more clearly and strongly, and, today, the main theme for the 21st century could/should well be “human safety”, both with regard to computing, and with the industrial revolution in general.

However, the generally buggy nature of HW and SW back then (and miniscule capacity that tempted “logical brinksmanship” (as Dykstra put it)), along with fairly simple OSs that were understood to be poor, etc., meant that many of the best researchers back then were highly conversant with systems and how they were different from algorithms, and—partially aided by the designing of the ARPAnet, Ethernet, and Internet, the Apollo SW, transmission errors in general, etc.,—were very concerned about dealing detecting and correcting errors, and especially to find ways to allow critical systems to be “eternally up”.

This was still thinking in the small contexts of the time—but the needs were visible and the researchers could see them.

There has been a qualitative shift in scaling, reach, and potential harm since then, to the point where we need to introduce new contexts for thinking about “What’s Going On?”.

A large principle is “Duty Of Care”, which has been in use in Medicine and Engineering for centuries, but which wan’t much of an issue in computing 50–60 years ago (or today!). The central idea is that we should have the right to expect more than “average human abilities and fallibilities” from our physicians and structure and vehicle builders. We don’t want people who can only apply Band-Aids to do open heart surgery. We don’t want people who can only make a dog house to design our apartment buildings and airplanes. Etc.

The awareness of a large context change should change what and how we are judging. For example, we could be saying about today’s so-called “AI”: it doesn’t matter whether it is good or bad, right or wrong, or true or false, what’s important about it is the reach and scaling of how humans react to such a new (to them) and not understandable mass media technology, that has no provisions for “Duty Of Care”, no “Surgeon General’s warnings, and can easily be tied into other non-Duty-Of-Care disasters such as social media.

We could say—similarly about CrowdStrike—what matters is not what it was supposed to do (us “AI” to ward of outside SW attacks), but that it is a wide spread independent set of processes with no sense of human beings, what it is, what it’s intentions need to be, etc. In other words, it is one of many “AI”s that is powerful but completely lacks any sense of what it should and should not be doing—this is a kind of highly dangerous stupidity. Thousands of old style unit tests will not help.

These are a few of many cases of Einstein’s observation that “We cannot solve our problems using the same kinds of thinking that we used to create them!”

I don’t have any impression that many (most?) of today’s companies and their computerists are either (a) even thinking about this, and (b) are up to the task if they were.

What does Alan Kay think about an Azarrbaijani jazz?

I had not heard of “Azarrbaijani jazz” until asked this question. So I looked it up in wikipedia:

https://en.wikipedia.org/wiki/Azerbaijani_jazz

This was not very informative. However, YouTube (as usual) came through with many examples I listened to — these were from various periods starting in the 1930s through today.

I realize that I don’t know enough to glean whether the YouTube examples were typical or not.

I quite liked much of what I heard. This is because my jazz tastes involve definable harmonic structures with improvisation over the chord patterns. And this is what I heard here. I’m not so fond of “passage work over cluster chords” (which is one of the ways American jazz went after bebop).

To oversimplify, the harmonic structures and substitutions I heard are very recognizable and nice to hear. The melodic structures retained some of the character of traditional Azerbaijani music — in many cases transferred to the piano — this was interesting to hear: the tremelo playing of plectrum instruments to the piano led to very novel lines, and — combined with the rich harmonic structures — I found it very pleasing to listen to.

Was the work of Adele Goldberg and others, introducing children to programming with Smalltalk, not seen as a duplication of Papert's work at MIT? Why a new teaching language? Why not just teach the children LOGO?

Short answer to the question because the context turned out to be long.

We did duplicate a fair amount of Papert’s and Solomon’s work at MIT — as scientists usually do for (a) learning the details of the work, and (b) to see if we could get the same results (both of these worked out).

A series of new teaching languages from both MIT and Parc, because (a) we didn’t know what would work best, especially for the deeper goals both groups had (b) multiple perspectives are generally a good route for both solving, and especially problem finding (c) we shared ideas and arguments because we got more synergies on what we didn’t understand that way (d) this worked well because neither group tended to rivalries.

Neither LOGO nor Smalltalk was the answer to what we wanted to accomplish, they proved to be just starting points.

---

As is so often the case, the actual past history of ideas is more complex and a longer story than fits in a “short answer” forum like Quora.

The Association for Computing Machinery (ACM) has sponsored longer histories. You could look in their History of Programming Languages (HOPL) series for a more detailed sense of how ideas came about, were shared and used, etc. written by the main inventors. For example HOPL I has LISP and JOSS and Simula. The first two were the main precursors of LOGO, and all four came before and were big influences on Smalltalk (whose history is in HOPL II). The HOPL IV book has extended histories of LOGO and a large separate history of Smalltalk implementations.

LOGO itself was first made as a combination of LISP power and JOSS friendliness by Wally Feurzig and Danny Bobrow at BBN to help children learn about human languages. There is a book by them that precedes the “Seymour era”. Seymour Papert was a consultant there and Cynthia Solomon worked there. Seymore had a special inspired way of looking at things, and Cynthia could make things happen, especially with children.

It was seeing this work in 1968 that got me to switch my interests from adults and personal computing to concentrating on children. This was primarily because Seymour’s big idea was that children making things out of “computer stuff” at an early age could positively affect their world view of modern thinking, especially mathematical thinking. I thought this was truly a great insight, and that his and Cynthia’s approach was terrific.

I had a degree in mathematics, and this helped understand the great path they were undertaking. However, I also had a degree in molecular biology, and this brought some of the perspectives of scientific thinking to my own ponderings. One of these was the tradeoffs between math and science as well expressed by Einstein in 1921 in a lecture to physicists in Berlin:

" ... as far as the propositions of mathematics refer to reality, they are not certain; and as far as they are certain, they do not refer to reality."

My take on this is that learning “pure reasoning” is quite dangerous for humans without understanding that it is hermetic and needs to be grounded in science (which is “hermetic with warnings attached”).

Smalltalk came from notions about the usefulness of “multiple similar parametized entities” that would allow many ideas to be thought about as instantiations of whole dynamic ideas. My first thought after seeing what Seymour and Cynthia were doing was the idea of a children’s tablet computer with the children being able to program a game they would like to play — like SpaceWar — that would require them to investigate and model some of how the real world seemed to work — and that both spaceships and “torpedoes” could be modeled as a single idea if you could instantiate ideas.

Instantiation of objects appeared very early in Sketchpad (‘62), and then Simula (‘65). Of dynamic processes in the early 60s multiprogramming and time-sharing OSs. My early idea (‘66) was that this could be used as a universal way to describe and build arbitrary processes in a clean fashion. I used it in my PhD thesis (‘69) and then started to see if there was a children’s language that could be made. The first one I did was called “SLOGO” for “Simulation LOGO”.

This was a very fun time in computing — both in general and especially for computerists who could see that it could be specially for children and education. There were no rivalries, everyone had different points of view, and freely shared their favorite approaches.

Everyone realized — more or less — that (a) different children needed different approaches (b) Montessori was a pretty good starting place in general (c) Piaget’s and Bruner’s approaches to children’s thinking was also a rich perspective (d) No one knew enough (e) lots of actual work with children would be required, etc.

For example, “we knew that we didn’t know” what kind of programming system would help children be more than programmers (it had already been noted that “programmers” didn’t automatically acquire much if any special perspectives and outlooks beyond simple “instrumental” outlooks (that were even very often counterproductive).

I was very lucky to find and attract Adele away from Stanford to be a co-manager in my group at Parc. She was a complete powerhouse in all directions, had superpowers, could do everything, etc. (she later won the ACM Software Systems Award, and still later became the second woman president of the ACM, is now in the prestigious Computer Museum Hall of Fellows, etc.).

We did duplicate a lot of what Seymour and Cynthia did — both to learn, and to have enough context for additional thoughts. Adele thought of a great way to teach objects directly (and first) that was very fruitful, and led to the children being able not just to program, but to make tool frameworks (this was partly because, once you understand instantiation, many things that are difficult to handle in a data-procedural language, become quite easy).

In fact, neither LOGO nor Smalltalk were good enough for the original visions. Both groups continued to make new designs. The head systems “machers” were Brian Silverman for LOGO and Dan Ingalls for Smalltalk. (Dan won the Grace Hopper Award for inventing BitBlt, the ACM Software Systems Award for Smalltalk, and is now in the prestigious Computer Museum Hall of Fellows, etc.)

Later, important pedagogical work involving Smalltalk was done by Brian Harvey (his three books “Computer Science Logo Style” are three of my favorites).

Mitchel Reznick did an important thesis that involved “massively parallel” entities that were instantiations (StarLOGO). Hypercard by Bill Atkinson and Dan Winkler had many important insights. We did many systems also.

The basic thrust here is that “when you don’t know how to do something, do many things!”

At Disney we tried a new angle, which resulted in the Etoys system (programmed in Smalltalk). This used a very uniform approach to instantiable entities and an easy approach to parallel models. This turned out to be extremely simple and powerful at the same time.

This was the first one that I felt was workable for the kids — 25 years after the start of the project! — and could really embody the mathematical/practical outlook of science. Mitchel Reznick wanted to do a followon, and we got NSF to fund it. The result was Scratch (and it was first programmed in Smalltalk).

One can learn a lot from observing what happened with Scratch. (a) it was very successful, and it’s possible to say why (b) it is a very large example that there has to be a well thought out curriculum to go along with a new tool, or the results will be mostly mundane (which they were).

Is it possible to design a self-consistent computational universe where a small Forth-like meta-model provides the semantic substrate, similar to how Smalltalk provided an object substrate?

Sure. You could start your thinking and learning by looking at how LISPs have been implemented.

There are lots of possible meanings for “meta” here, including: reflection, the kernel in itself, etc.

It’s worth pondering that a really useful kernel needs to start off with “What is actually needed” perspectives regarding allocation of storage & levels, processor, safety, security, etc. (Most computer folks go directly at programs and computation, but — paradoxically — the “goodness down the road” will depend as much on these properties as it does on what most people think of as “computation”. You basically have to redefine “computation” and implement it as part of your virtual machine to have a decent chance with the scalings we now have to deal with.

I would try to read and understand the first 100 pages or so of “The Art of the Metaobject Protocol” book for useful perspectives. Another “first 100 pages gem” is the first 105 pages of David Fisher’s 1970 CMU thesis: “A control definition language”.

FORTH already has a fairly simple/easy bootstrap from scratch on a new machine. Another “meta thought” would be how to avoid having the kernel definitions be in machine code (and specific to the machine) and, instead, find a machine independent way to define the kernel — presumably you’d like to have it be FORTH — but to have enough extra mechanism in your architecture to allow you to write a meta-translator that would make the resulting system machine-independent.

Smalltalk first used kernels implemented in machine code, but then was able to write the kernel in itself, and then a compiler into lower level forms — this isolated the non-Smalltalk part to a very small part of the system.

Some of the experts who have done great work along these lines are: Dan Ingalls, John Maloney, Ian Piumarta, etc.

(This latter process can be a lot of fun to try to figure out elegant and simple ways to deal with the fact that — at some point — you will have to grapple with something that isn’t FORTH in order to get it running on a given computer (the idea is to keep this both minimum and as simple as possible).)

All of the greats and experts in CS and programming say that A. Kay's Smalltalk is the best PL and PE ever. If it's a clear winner, then why is it not used widely in the science and industry?

In “The Early History of Smalltalk” written for SIGPLAN as a ramp up to HOPL II, I made it clear that the Smalltalk that was released — and is the one people mean today when they say “Smalltalk” — was much more due to Dan Ingalls — for much of its design and implementation — and to Adele Goldberg — as a co-manager of our research group — for getting it out in a form that could spread and be understood. I started the project and wrote the first “math for a Smalltalk” that showed how to do a minimal bootstrap, and contributed a few more things.

Dan, Adele, and I were jointly awarded the ACM Software Systems Award for this.

Smalltalk was originally invented in the 70s to be used in the 80s with HW architectures that we thought would follow the Burroughs architectures into VLSI CPUs. (These allow very high level languages to be small and efficient.)

This didn’t happen, and C would up being a much lower level language approach to dealing with what Intel, etc., did provide. VHLLs like Smalltalk and LISP suffered …

Amongst of these giants, who is according to you the most important person in history of CS due to their contributions, N. Wirth, D. Knuth, K. Thompson, G.V. Rossum or J. Gossling?

It doesn’t really work to “try to sort ‘giants’ (as you yourself have termed them)”.

In music, what is the point to try to rank Bach, Beethoven, Mozart … ?

I do like to be able to claim that “X is great”, and have that mean they are above a special threshold of contribution to their area of art.

I think this gives us a way to recognize and still keep everything simple.

If Alan Kay were to modify his Squeakland.org reading list, what would he add or remove? http://www.squeakland.org/resources/books/readingList.jsp

I was originally — in the 80s — asked by a company for my top 10 books. I responded by saying most heavy readers like me don’t have a top 10, and then made the mistake of providing about 500 titles that I thought were worth reading. This list has passed through many hands since I last touched it. Now we are about 40 years later, and of course I’ve kept on reading …

However — on reflection — I realized I should have replied with something more general than a list — that the list I provided was much too much about my own educational catalyzers, rather than how to help others find their own pathways … .

I think the best use of books is to give us a wider range of ideas to possibly follow up than we might have by ourselves. Because any written down sentences can be prefixed by either or both of two phrases — (a) “It is definitely true that …” and/or (b) “It is definitely not true that … “ — we can readily see that books can be about lots of ideas and assert many different sides of any question …

This means that books — writing — is very dangerous without also setting up ways to deal with the ideas, especially ones about the universe we seem to be living in. Modern science provides some very useful tools for not getting confused (as Einstein pointed out to a group of scientists in 1921 when he said " ... as far as the propositions of mathematics refer to reality, they are not certain; and as far as they are certain, they do not refer to reality." Geometry and Experience, Lecture before the Prussian Academy of Sciences, January 27, 1921.)

But it’s clear that we also need something similar to deal with ethics and morals — and this is much more difficult because it is tantamount to needing to create (design) the ethical equivalent of the physical universe which is already here. This “psychical universe” has to be generally agreed on in order to work (the nice thing about the physical universe is that it can be investigated from scratch).

In any case, what is needed to deal with ideas — and especially in the context of humanity — is something that works much better than blind belief in whatever ideas are appealing. All of us are humans in a single species, and we all need this. However, we are all interestingly different: to the extent that we need to find our own intellectual journeys to build “better than mere believing minds”.

When I was a kid, I used to roam the stacks in our public library — and my grandfather’s book shelves — to read books “on general principles”. As I mentioned in another Quora answer, many of the books I thought were not very good, actually contributed to the whole experience (in part because they provided a view of the larger human experience, and that parts of it seemed to be really misguided — and this meant that the parts I liked could be misguided also!

And all of this helped with what reading brought to my own growth experiences …

Because — as humans — we suffer from many cognitive biases — and a deep one is “confirmation bias” — it’s critical — no matter how much we think something is true — to maintain a little area where doubt can operate. This little patch in our minds is the soil in which progress in thinking can be grown.

This is what I wished I had told those who originally asked me for “my list” …

What were the advantages and disadvantages of having complete control over hardware and software designs at Parc in the pursuit of developing personal computing?

Larry Stewart’s answer to a related question covered some of the useful details (he was also at Parc and a central part of many projects there).

First, we didn’t have “complete control” over the HW — for example, during the most productive decade of Parc we almost exclusively used chips made by others (most would be called “medium scale ICs”), and the only VLSIs we used were for memories. So e.g. most CPUs etc., were board sized or several boards of MSIs. Virtually all of the HW critical architectures were completely designed and built at Parc.

There were various ways to look at this. My slogan was “Hardware is just Software crystallized early”, and was thus an intrinsic part of the software design process.

The stated aim of much of the HW at Parc was to act as “time machines” via using money to make machines today which Moore’s Law would allow to be made many orders of magnitude cheaper in the future. The idea was to develop the SW of the future now by being able to run it on “supercomputer” versions today of commodity computers tomorrow.

This made a lot of sense along many dimensions. And that particular time period had just introduced the critical memory HW (MOS DRAMs) which just happened to run from 5–10 times more slowly than the much faster MSI chips. This meant that a microcoded CPU and small superfast memory could be much faster than the general RAM, and that meant that the machine would be able to emulate in microcode many functions that formerly required HW — and further — that if the microcode memory was dynamically loadable (at Parc it was) — then one could emulate many different enivironments and functions at will.

For example, each of the UHL languages done at Parc had its own microcode, bitBlt (the screen painter) had its own microcode, so did the various music/sound processing, etc.

These skills were critical in making HW such as the Ethernet, the laser printer, and their interfaces, etc.

The advantages of all this should be obvious. Even though Moore’s Law would eventually catch up, this approach provided a many years head-start and extreme flexibility by allowing future SW to be thought about as a whole system. Since we were aiming at new kinds of SW, we needed the time and flexibility (and also not to optimize and work around old HW) etc.

The disadvantage was that we had to design and make all the tools, building, testing, etc. In practice, the Parc researchers were more than up to this (quite amazingly so in my opinion), and a number of the most important systems were done in terms of months.

Why do you think people today are less likely to explore programming on their own compared to the early days of computing?

The simplest answer is that in the early days of computing with limited access to computers, those who did explore were really interested, and a lot of them were explorer-types and self learners.

Numerically, I would guess that today both the absolute number of self-explorers is much higher (just from volume and accessability alone).

The percentage could even be a bit higher, but that would need careful surveying to estimate.

Interesting question!

What would be the best way to learn Smalltalk in a Linux environment?

The most used Smalltalks — for example, Squeak Smalltalk — are “bit-identical” — meaning they present exactly the same GUI, system, and computation on any of the dozens of platforms they run on.

Who is called the father of modern computers?

Please tell me in a comment what time frame you mean by “modern” and what you mean by “called the father” (the way it is phrased here implies something like a social vote rather than an appeal to history.

---

For now, I’ll take “modern” to mean “20th century”, and “called” to mean “is”.

My personal pick would overwhelmingly be Turing. Besides all the interesting theoretical and mathematical ground he covered in his 1936 "On Computable Numbers” paper, it was his “rather engineering” approach that wound up having the greater impact.

As von Neumann later pointed out, it was the universality of interpreting symbols (and symbols of interpretation) placed in the context of simple machinery that suggested a practical route to actually making a universal computer. (What we today call a “von Neumann computer architecture” is actually an almost direct adaptation of the approach Turing used in his paper.)

Both for engineering and philosophy, it turned to be very important that the design suggested “almost no machinery” coupled with lots of “simple marking memory” — this greatly helped both practical building of physical computers and ways to thing very large thoughts about layers of abstration.

In the early days of personal computing, what one platform - if successfully commercialized - could have set computing on a better trajectory than the 1980s micros did? - Details in comments.

I have an urge to provide a decent answer here, but I don’t think I know how to.

This is primarily because (a) of the ways that systems within Moore’s Law can or cannot be developed, and (b) because I think you mean “personal computing” when you say “computing” (and now we are expanding an idea into an even less savvy market than the mainstream market for various sized computers that existed before the 80s).

At Xerox Parc — and even a bit before in the ARPA community — because of the way we thought, did prototyping, etc., we could write down basic numbers for how many bits of what kinds and how many cycles of what kinds would be needed for our conceptions of personal computing to be possible (e.g. it’s important to understand that ARPA-IPTO made personal computing systems on mainframes of various sizes — some of them really large and expensive (“personal computing” is a collection of experiences, etc.)).

The advent of the Intel 1103 DRAMs changed the costs for one very important part of the kinds of experiences we were thinking about. The Alto was a gamble that we could do enough to make real progress in practical personal computing even if some of the other system needs did not come through in time. This barely worked out, but the fact that it did was significant.

It’s worth looking at markets at various times and noting what they were willing and able to pay. An important early market in the mid-70s was for stand-alone dedicated word processing machines. A much more general “Alto III” could be made and sold for a profit at the same price point as a dedicated word processor ca 1976–7. That would have been revolutionary for a while — and would certainly change the perception and perspectives of the market. This would be “personal computing in business” (not “personal computing in consumer-land”).

In the world of consumers, people were willing to pay a fair amount for a house (usually by borrowing), and still a fair amount for an autombile (also usually via a loan). Most products called “consumer products” were much cheaper. Note that the automobile metaphor is worth looking at more closely. If you understood what a computer was and what it could do then you might be able to see it as your personal information vehicle, and be willing to pay the average price of a car ca 1980–2 for one (this was about $9.5K).

There was at least one high quality system of that price happening around that time: the Apple Lisa. My opinion of it was high, and I was happy to use them after I went to Apple. However, the large market wouldn’t accept it, and was willing to do much less for paying much less via the IBM PC. This became a huge barrier for the rest of the decade and beyond.

There were a few late 70s early 80s machines we’d call “workstations” today (and because consumers didn’t want to pay what the Lisa was worth, we have to include the Lisa as a kind of low cost workstation). Besides the “Dande…” workstations made by Parc, there were also several really impressive machines made by 3 Rivers Corp (initally designed by Brian Rosen who had spent time at Parc). There was the “3M” initiative by CMU (1 Million Pixel, 1 MIP, 1 Megabyte RAM) — this is kind of what was needed, but did not fit into what the larger markets were willing to pay.

An interesting “better than Mac” that could have been put in the Mac space was the Amiga …

Moore’s Law predicted that really good personal computers would eventually reach price points that the mass markets would accept. A good idea would have been to wait a few years — but one way to characterize “consumer businesses” is “what can you sell in mass to unsophisticated markets for reasons other than ‘quality’?”

The big problem with this is that bad defacto standards are going to happen — and much much worse — many of these will stick (and some for no good reason at all).

This was especially the case in the 10 years between the early 80s and early 90s. Moore’s Law did go more or less as predicted (and in some important cases — like file storage — vastly exceeded expectations). But many important things needed in operating and development environments and assets and UIs for users, etc., stayed mired in their early weak conceptions of the early 80s. The Web browser was done especially badly (and the Web itself was much too weak an idea).

Basically, no matter what most people thought was going on in the 80s, that decade was really the decade of “locking into not-so-good” and then pretty much staying “not-so-good” (which means “ever worse”) until the present.

A really interesting question — I think — is whether the general unsophistication of most markets (and even most computerists) could ever have been dealt with in a much more positive manner. What we actually got was “a will of the people”. This is what happened with Covid recentily, and with the climate over the 60+ years that experts have known the depth of the problem.

It just hasn’t mattered that — for many important things — experts often do have better ideas about issues, directions and methods. The bizbiz coupled with unsophisticated markets is and has been driving.

How did early software developers manage to create efficient programs with such limited computing resources?

Basically: “every which way” they could think of — and this was actually “highly motivated” by the limitations of computing resources.

For example, I have often shown movies of the amazing 1968 Engelbart “Mother of all demos” with its subsecond response times, pointed out that it was done on a half MIP, 24bit, 192KB time-shared computer with multiple users, and asked “how was this possible?”. Only once in years of asking this question have I ever gotten an accurate answer (interestingly from a student in an audience of undergrads): “Because they *wanted* subsecond response?”

Yep. That was one of their principal goals, and they kept it, no matter what.

Another way to look at this is that the instruction times of even really slow computers, even running interpreters (and losing another factor of 10 or so), are much faster than human nervous systems, so even for interactive systems in the early days — such as “George” on Whirlwind in 1954 — one could get quite good real-time response (if one really wanted it).

And — in general, whether real-time or batch — many of the earliest programmers were also mathematicians, and so had a feeling for both “abstraction” and “deeper meta”.

Also — a bit of apples and oranges here — the most stringent limitation most of the time was “amount of available RAM”. This tended to dominate how programming needed to be done (basically, if you have to swap/overlay from secondary storage, you have killed your average cycle time). One of the most amazing examples of this was done by Hal Laning (the original creator of “George”) for the extremely slow and limited Apollo Guidance Computer for the moon shot project. The basic computer had a memory cycle time of ~16 bits in 12 microseconds with a RAM of 4KB, however many operations were actually done by an interpreter whose pseudocode ran about 50IPS (!)

Another interesting case from the old days is when Engelbart’s group moved to a faster PDP-10 timesharing system, and discovered that making an interpretive “byte-coded” interpretive version of their implementation language (much slower in simple benchmarks), actually ran much faster than optimized compiled code in practice because the tinier p-codes could fit much more easily into the slicing and dicing of memory scheme that the Tenex OS used — thus vastly reducing the need to swap — and thus vastly improving the “system dynamic” and “user experience/perception”.

One of my favorite “bang for the buck” systems from ca 60 years ago was the Meta II compiler-compiler done on an 8K 6-bit byte IBM 1401 computer by Val Schorre at UCLA. This worked because he had really thought through the problem space and had come up with a tiny meta-characterization that had enough range to deal with a wide range of higher level languages (this is basically the “Math Wins!” idea). He was also able to apply the meta-idea to itself to get the pseudo-code for the system that a simple interpreter on the 1401 could run.

He wrote a classic small paper that was able to give explicit implementable examples of the meta system, a teaching example, and two sample “Algol-level” languages. The system in itself is fun to contemplate:

image

This is the link to the entire paper

Basic idea here is that there is often — even “usually” — some “meta math” under most problems. Spending some time trying to articulate and represent this will often result in much tinier, easier to understand and debug systems.

Computer scientist Edsger W. Dijkstra said that “It is far more likely that the programming occupation will become extinct than that programmers will ever approach professional status.” What did Dijkstra mean by that and do you agree with Dijkstra?

I’ll start by saying that I don’t clearly understand the English as simply stated.

However, three things might help us (a) English was not Edsger’s native language (b) he was very proud of his skills in English (c) he delighted in coming up with snide remarks (one of his good friends was Bob Barton, who also liked to do this).

I read this as “programmers will never approach professional status” — and — “the programming occupation will never become extinct (too bad)”!

He desired the process of getting computers to “do what we mean!” to be rather like the mathematics he was used to when getting his PhD in mathematical physics. In his own words:

After having programmed for some three years, I had a discussion with A. van Wijngaarden, who was then my boss at the Mathematical Center in Amsterdam, a discussion for which I shall remain grateful to him as long as I live. The point was that I was supposed to study theoretical physics at the University of Leiden simultaneously, and as I found the two activities harder and harder to combine, I had to make up my mind, either to stop programming and become a real, respectable theoretical physicist, or to carry my study of physics to a formal completion only, with a minimum of effort, and to become....., yes what? A programmer? But was that a respectable profession? For after all, what was programming? Where was the sound body of knowledge that could support it as an intellectually respectable discipline? I remember quite vividly how I envied my hardware colleagues, who, when asked about their professional competence, could at least point out that they knew everything about vacuum tubes, amplifiers and the rest, whereas I felt that, when faced with that question, I would stand empty-handed.

He had “bitten the apple” and gotten hooked. And the dual attractions and repulsions of math on the one side and programming on the other stayed with him his whole life.

It was his pristineness towards the former that got him to disparage the latter — even though that was the path he also had taken …

Was one byte ever less than eight bits in the past?

When I started “programming as a job” in the US Air Force (ca 1962), the most general meaning for a “byte” was “a contiguous field of bits in a word that was smaller in size than the word”.

I still remember feeling annoyed when IBM redefined the term to mean an 8-bit entity (I remember this from 1964 when System 360 was announced — however, IBM could have forced the redefinition before this).

The Wikipedia article is rather wishy-washy about this. I think that my definition in the first paragraph is actually correct.

https://en.wikipedia.org/wiki/Byte#:~:text=The%20byte%20is%20a%20unit,memory%20in%20many%20computer%20architectures.

Is programming learned inductively or deductively?

I don’t feel I completely understand this question (or, perhaps, don’t understand the intention behind the question).

But we can certainly start by comparing various kinds of reasoning already addressed in Quora:

https://www.quora.com/Whats-the-difference-between-inductive-deductive-and-abductive-reasoning

Let me add an additional one or two: “projective reasoning” — this is similar to “abductive reasoning”. It is familiar in both mathematics and programming as a kind of “feeling” about a situation (in programming it often occurs as a “sense of difficulty or ease” when a problem is mentioned). It is also likely related to what is called “forward reasoning”: to take definitions/premises and drive them “forward” using logical operations to reach conclusions (so: similar to deduction, but driving “outwards”).

My first paying job as a programmer/coder was in 1962 — so 63 years ago as of this writing — as an enlisted man in the US Air Force. They had a test to screen for possible aptitudes. Those who passed were trained by IBM in a single dense wall to wall week, and then sent back to base to write and debug code (i.e. the rest of the training was “on the job training”).

From my own personal experience, I would say the answer to this question is “All of the above (including my additions), and more”. One example of “more” is that it was common enough back then for the hardware to have glitches — sometimes “intermittant glitches” — so that one could not completly rely that executions were completely deterministic.

I think I might instead say that “Learning programming is both learning how to “solve problems”, and especially how to “find problems”, mixed in with much more “design” than one appreciates at first”.

What are your thoughts on a future where code is represented as a structured model, rather than text? Do you think that AI-powered coding assistants benefit from that?

First, it’s worth noting that the “syntax recognition” phase of most compilers already does build a “structured model”, often in what used to be called a “canonical form” (an example of this might be a “pseudo-function tree” where every elementary process description is put into the same form — so both “a + b” and “for i := 1 to x do […]” are rendered as

in structures, such as:

i.e. “operator like things” at the nodes …

Another canonical form could be Lisp S-expressions, etc.

These structures are made precisely to allow programs to “reason” about some parts of lower level meaning, and in many cases to rearrange the structure to preserve meaning but to make the eventual code that is generated more efficient.

Long ago in the 50s this was even thought of as a kind of “AI” and this association persisted into the 60s. Several Turing Awards were given for progress on this kind of “machine reasoning”.

Most coding assistants — with or without “modern “AI” — also do reasoning and manipulation of structures.

It’s important to realize that “modern “AI” doesn’t understand human level meanings any better today (in many cases: worse!). So it is not going to be able to serve as much of a helper in a general coding assistant.

A slogan that might help you get past the current fads is:

NOT DATA … BUT MEANING!

Why did Japan’s Fifth Generation Computer Systems project fail commercially? Are any of its concepts and advances presently being utilized?

The answer by Marcus Triska sums up much of the history and context rather well

https://www.quora.com/Why-did-the-Japanese-Fifth-Generation-project-to-reinvent-computing-based-on-parallel-logic-programming-fail-And-what-can-be-learned-from-it-Are-there-results-that-might-be-useful-to-us-now-Eg-interesting-operating?no_redirect=1

I will just add to this a few impressions from the early years of this project. The Japanese project leaders were quite open about the project and invited Ed Feigenbaum to select 20 or 30 American computer scientists to visit periodically and discuss the project, etc. This group included Ed, John McCarthy, Marvin Minsky, many other luminaries in the field, and a “fringe member” (myself).

These visits were quite fun, and the American group itself got along quite well (we all liked to argue in a fairly reasonable way).

The Japanese scientists in the 80s adopted the sensible approach of ARPA/Parc in the 60s and 70s of trying to define and write enough SW to be able to design HW that would optimize difficult important parts that were not served by conventional architectures. The large scale and ability of the Japanese government to both fund and “coax” companies to cooperate at various levels and standards started off well.

Some of the details were doubtful (IMO, etc.). Some of us felt that it would be a better idea to invent a new language that would be more suitable than PROLOG as a base for the large goals (there were a number of ideas in PROLOG that were quite good, and some that needed to be reinvented better). One of the eventual fallouts was Shapiro’s “Concurrent Prolog”, which was a much cleaner, better, way to approach this genre of ideas.

I meant to add this in the original answer: Every computerist should know about and have read with a “research eye” Kazunori Ueda’s excellent paper: “Logic/Constraint Programming and Concurrency: The hard-won lessons of the Fifth Generation Computer project”. Many of the issues and suggestions here are very worth pondering in the light of today’s problems (i.e. you will get a lot from reading this regardless of whether you agree or not).

A perhaps too simple bottom line is that quite a bit of good research got done — but there was also a lot of constriction to move quickly to showing practicality. It might be possible to do this at the pace that was attempted, but I’ve never witnessed it. The real learning curves in my own experience actually — unfortunately — take much longer than the pace of business.

I think this project — overall — helped computer science in general and Japanese computer science in particular. To me it was a research project, and did turn up considerable knowledge of various useful kinds.

How does Alan Kay describe the nature of his way of collaboration? Would he be willing to work with teams that are below average, which could diminish the quality of his work and lower his motivation?

I don’t know the answer to this (I’ve always tried to work with colleagues who are more able than I am — going back to playing in bands, theatre, tennis, science, etc.). Also, ARPA-IPTO had the interesting idea of creating groups of smart people where no-one was the smartest in the room (I think most of us still thought it was Butler Lampson).

I don’t know how one would go about determining “below average” … a much better idea and question could be about “less able”, where the goal would be to help the growth of “more able”. This is definitely a better way to think about children, and I think it works for adults also (in other words, keep the potential of others in mind and alive).

A key notion of “team” is that every member winds up able to get over the bar (because every member is helping every member get over the bar).

A big shift I made in my 20s was from changing my self rating as to how I was doing based on “results” to assessing “quality of effort”. This worked much better because I was much more in control of “quality of effort”, and not worrying about “results” plus higher “quality of effort” tended to produce higher quality results (but now I didn’t care so much about the latter, etc.)

Sorry for the not very good answer.

Since von Neumann invented the merge sort algo in 1945, just before ENIAC was introduced in 1946, what does Alan Kay think motivated him to explore sorting algos when computers were still so new and HLL didn't yet exist?

Richard Norman has already provided a nice short accurate answer to this question.

The notion of “algorithm” goes back at least to the early Greeks (and several are given by Euclid).

Another area that extensively used algorithms before programmable digital computers was “Relaxation Methods in Engineering and Science” — the classic Southwell books date back to the 1940s.

Many of the early processes that ENIAC ran were ballistic calculations that had been done for years by humans using desktop numerical calculators.

HLLs are not needed for programming or for carrying out algorithms. However, a kind of HLL (Flowcharting) was used in many areas to provide an HL portrayal of various calculation schemes.

"Great visions are not goals, but the ultimate cat toy". What does this idea mean to Alan Kay? And could it relate to your belief that ‘you don’t truly understand something if you only understand it one way’?"

I’m trying to understand this question. I used the quoted phrase above in several talks (many are on YouTube), to explain how JCR Licklider in the 60s, was able to get so many highly talented researchers to work on interactive computing and world-wide networking.

The lead-in to this phrase used two fun videos I found on YouTube. The first was a take-off on westerns showing cowboys trying to herd cats.

https://www.youtube.com/watch?v=m_MaJDK3VNE

I then showed a video made by two smart children showing how to get cats to do things.

https://youtu.be/3BZmuP3gyQA?t=31

(In other words, come up with cat toys if you want cats to do things.)

(Researchers are Artists, and thus/also rather like cats.)

Come up with a vision that attracts them and they will start working on it — etc.

(I think you can see that this line of attack is only peripherally related to the second quote (which was by Marvin Minsky) and which relates in some way to most other ideas …. )

What does Alan Kay suggest here: "Computing is not about clocks, it's about biology". How come the essence of computing is more about the way living organisms evolve?

I wouldn’t have said the “quoted phrase” in the way it is stated (so the quotes likely mean something else). And I wouldn’t have said “The essence of computing is more about the way living things evolve” (I would have used a term like “work”, instead of “evolve”).

In many talks, long ago, I have shown a picture with the title “From Gears To Biology” as a metaphor for a qualitative shift in thinking about computer systems — from early-bound, tightly coupled rather small mechanisms — like clocks — to late-bound, loosely coupled rather large mechanisms — like living organisms. In computing, this happened within a single lifetime, and many parts of computing are still stuck in the earlier metaphor.

image

We can see the difficulties that general computing has had in adjusting to the great scalings brought by Moore’s Law. For example, the “going off the Internet of Facebook/Meta” and the recent “Crowdstrike” (what a strange title for a computer system!) debacle.

Pondering the qualitative differences between gear-like systems and biology-like systems can be very helpful in thinking more clearly about dealing with great scalings in computing systems.

What does Alan Kay think about ADHD in adults? Should it be controlled with medication?

I don’t have the medical background to answer generally.

However, I have many creative close friends who have ADHD. Most of them use the drugs to help (they do help greatly) to cope with many actions in the world that are difficult for those with ADHD.

When they are deeply within creative phases of their life that work well off the drugs, but not well with them (for example, acting in theatre), they often go off the drugs. I have a very close friend who uses the drug in the morning, and lets it wear off for theatre rehearsals in the evening.

These are just annecdotes, and certainly don’t cover the general case.

However, all of my ADHD friends look at the drugs as a godsend, and all whom I know use the drugs wisely.

What is Alan Kay’s view on analytic idealism by Bernardo Kastrup?

I’m sure that I need to read his forthcoming book in order to attempt an answer:

Analytic Idealism in a Nutshell: A Straightforward Summary of the 21st Century's Only Plausible Metaphysics

I’ve ordered it — it is not out yet — due to arrive Nov 1st this year (2024).

However, I should reveal a personal prejudice against what I understand is the basic idea. As with most prejudices, it doesn’t have much substantive behind it, but I’ll be trying to keep this in mind when I read the book.

See you in November …

---

Here are my reactions as of Nov 6th.

I got his book — Analytic Idealism in a Nutshell: A Straightforward Summary of the 21st Century's Only Plausible Metaphysics — and have read it.

The ideas in it are put forth as an essay into Philosophy, particularly focused on Metaphysics (the nature of Being itself).

Wikipedia’s definition is good enough:

Metaphysics is the study of the most general features of reality, including existence, objects and their properties, possibility and necessity, space and time, change, causation, and the relation between matter and mind. It is one of the oldest branches of philosophy.

I wonder if I understand enough about the general subject area — Philosophy in general, and Metaphysics in particular — to make it worthwhile to share my opinions? My thought patterns are primarily within the general outlooks of science, math, engineering, and some of the arts: musical, visual, theatric, literate, etc.

I like — and subscribe to — Einstein’s reminder to scientists in his 1921 talk in Berlin:

" ... as far as the propositions of mathematics refer to reality, they are not certain; and as far as they are certain, they do not refer to reality."

Geometry and Experience, Lecture before the Prussian Academy of Sciences, January 27, 1921

Kastrup starts out with his version of this idea — one I’ve also used many times in talks — that

what we think of as “out there” is actually going on in our brains: “in here”, between our ears.

We create a kind of a map that is its own internal world, and — if we are sophisticated — we realize that our map should not be called “reality”, and at best we have to negotiate between the limitations of our mappings and the phenomenal evidence we can detect. This internal world each human has is sometimes called our “Private Universe”.

Kastrup likens the internal map to a dashboard inside a hermetically sealed airplane where the instruments provide enough information to fly the plane, but do not at all resemble what we’d see if we could look outside the plane (this is a quite good example/analogy).

Kastrup adds several other postulates in his Introduction. Here are his additional four (partially quoting):

There is an external world out there, beyond our physical minds

This world unfolds on its own, according to its own inherent dispositions, and reveals some phenomena our senses (and sense aids) can pick up

Human reason can recognize and model what it thinks are regularities in this phenomena, and in some cases can predict future phenomena

Complex phenomena can be “sufficiently accounted for in terms of simpler ones” (basically non-linear reductionism).

I think most scientists — including me — would agree that these five are highly likely. Kastrup calls these realism.

The writing style of this book is nicely clear, but very repetitious. There is an air of “I need to explain this many times because you are probably not understanding”. The last part might very well be the case, but repetition doesn’t help.

And, he starts arguing right away. It is not at all clear to me that his arguments (a) work, and/or (b) perhaps can be made at all. I am prejudiced in favor of essays which spend a lot of their front matter in exposition and follow this groundwork by argument. This is not done here.

I will try to fit the rest of this within a Quora-sized (albeit one still too long) answer.

One way to approach this is to ask whether his initial premises — which I agree with — actually allow his thesis — that Being is a kind of universal mentality that is very unlike the internal mappings that physical scientists try to make of Nature — to be successfully argued.

In Philosophical terms, the 5 premises above are essentially epistemological. Wikipedia again:

Epistemology, the philosophical study of the nature, origin, and limits of human knowledge

Taken together, it looks as though these premises of Analytic Idealism make it difficult to do more than claim any ultimate knowledge about anything “out there”. I.e. I think that Kastrup can claim his thesis as a proposition — but, given the premises, I don’t think he can substantiate his claim. As Einstein pointed out, logic/math/language/inference with the aim of “certainly” with regard to chains of thought will not refer to “reality” but only — if done as well as possible — to the consistency of the arguments.

For example, an argument of the form “because the thing in question is not this, this or this, it must be that” only works in reasoning/logic/math, etc., if you can first show that you have enumerated all the candidates and eliminated all but one.

The needed enumeration is not done here, and I don’t think it can be done.

A book that I’ve enjoyed very much — and which provoked much mulling — was Julian Jaynes’ “The Origin Of Consciousness In The Breakdown Of The Bicameral Mind” (TOOCITBOTBM). It is perhaps my favorite of this kind of book. I doubt its conclusions, but thinking about the issues, evidence, and forms of argument have. quite widened my thoughts over the years.

Analytic Idealism In A Nutshell is also a book that provokes mulling. The subject matter is a few levels more murky and abstract than TOOCITBOTBM, but trying to understand what is being attempted and pondering whether its arguments actually hold water can be quite enjoyable, and to some extent, illuminating.

How does Alan Kay view Don Norman's perspective on humanity-centered design?

I’ve known Don Norman as a colleague and friend for more than 50 years, and I value all his perspectives whether I agree (most of the time I do) or not (sometimes I don’t).

image

Don has had a long illustrious career centered in Cognitive Psychology and associated “human factors”.

https://en.wikipedia.org/wiki/Don_Norman

Don was a consultant for “The Learning Research Group” that I set up at Parc, and contributed greatly to many parts of our research thinking. He liked to argue (in good ways), and so did we, and this wound up covering lots of ground. He and I also worked together at Apple in the 80s. We have remained friends.

Etc.

A bottom line is that Don’s ideas are a must for anyone thinking about systems which involve people, and especially for systems where attempts are made to better fit them to human users and uses.

I’m being a bit elliptical here because the area of “actual UI design” — though there are a few priniciples — is still done best by many experiments with many design variations. This means that a new UI designer needs to absorb an enormous amount of lore — especially from great designers — and has to be willing to do the new large amount of work to bring a new UI to useful life.

Don’s perspectives are thus deeply important to learn and understand.

Why didn't Doug Engelbart's chording keyset become a standard interface like the mouse did? It requires more training, but would have a dramatic impact on user performance.

Here’s the classic Engelbart NLS interface:

image

This was the result of testing many alternative designs. A basic principle was that the hands should be “out” for navigation and “small text” (so “out” required all text to be able to be entered with the five finger keyboard and three buttons on the mouse), and the hands should be “in” for “longer text” (paragraphs etc) using the standard keyboard.

I used to be able to use this system, so I’m trying to remember some of the interesting details. One of these was that “command accept” <A> was the far right button on the mouse. This left five buttons for chords with the left hand, and the left two buttons on the mouse for “shifts”.

The “no-shift” default was for commands to the system. So “move word” from one place to another was done as:

MW <point and click> <point and click> <A>

This is a procedural syntax, is thus modey, and requires an explicit <do it> to terminate (an “object syntax” would select first and drag the word and not need the rest). One of the reasons they stayed with this was that the analog nature of the underlying displays was “drifty” (they used small electrostatic CRTs for better quality with an 875 line TV camera strapped onto the front to blow the image up to a reasonable size — this was called “Mexican scan conversion” back then — now non-PC — it worked well enough).

Telling the system what you were trying to point at greatly helped the locator algorithms to find targets in its drifty displays.

Learning the system was kinesthetic, and took about 10–20 hours over a few days to internalize. For example, at the start it wasn’t about learning the chord for “M” or “W”, but the muscular motion for “Move Word”. At first when there was an opportunity to type, one would go to the keyboard — but very soon one would just try the chord keyboard — etc. Fluent users could type up to about 30 words a minute with “hands out”.

In practical use, this scheme worked pretty well — much of the time was spent with “hands out” navigating, typing “small things”, etc. and the response of the system was sub-second, which encouraged one to get expert. The chord keyboard and two left mouse buttons collected “downs” and sent them as a 7 bit character when all went back “up”. This worked well. The experience was like low level flying at high speed.

Note, that the initial ramp is a steep learning curve, but in a week or so you wind up with a “violin” rather than a “kazoo”. The system didn’t have anything built in to help beginners (it was an experimental system, and they were writing code for expert use). It could have been learned by children via a game interface, but that was never done.

One of the plans at Parc ca 1971 was to “do NLS on a network of minicomputers” — and there were several notions of how to do this. For example, every Alto came with a three button mouse and a five finger keyboard.

image

The Parc Alto with chord keyboard and 3 button mouse

Bill Duvall did a terrific subset of NLS on a Nova 800 with Parc’s experimental display HW, and it rocked. He later did an Alto version when the Altos started to become available. Larry Tesler also used the chord keyboard — not in the way NLS did — for his word processor experiments.

However, there was never a complete — really useful — system done at Parc that made central use of the chord keyboard, even though it was an excellent idea. You had to learn it to use NLS at SRI but you didn’t have to use it for the most used systems at Parc — so it didn’t get deeply incorporated into any of the day to day Parc systems. One of the reasons was that quite a lot of the UI research and design at Parc was concerned with general novice end-users, and not with experts who would have explicit training as part of their introduction to the system.

It’s worth noting that text entry on an iPhone is not a good way to do it for expert use — so we are given a novice scheme, that — ironically — the end-users then put many many hours into optimizing (but optimizing a bad scheme almost never gives you a good scheme — what you get is a slightly faster version of the bad scheme).

This is a quite general phenomenon — and is found in easy start up musical instruments (like guitars, etc.). Many “pretty good” guitar players retain poor technique that has been optimized (for example, playing lines without the pinky on the left hand, and never including the pinky as they get better — at some point this creates enormous barriers for further progress).

We find it in many school subjects, where the early versions are presented so badly that what is needed later seems like an entirely new subject (a good example is that “algebra” is really a kind of arithmetic that focuses on what “=” really implies — but for most kids it seems like something they have never seen before).

It is not unfair to point out that “marketing people” do not like to sell anything that has a real learning curve — and this has gotten even more restrictive over the years. If we extend that idea to schooling (most teachers and educational institutions don’t like to teach subjects that have real learning curves), then we can see why so many children and adults are “in deep stew” today …

How valuable is it to apply concepts like the Curry-Howard correspondence (proofs as programs) and inductive proofs to the development of dynamic/complex systems, considering their correlation with type systems?

One of Marvin Minsky’s many thought provoking sayings was “The trouble with ‘New Math’ is that you have to understand it every time you use it!”. (Meaning, there is an important pragmatic place for mentally caching “rules that work” rather than working through the reasons for every use. Both are important, and some separation helps.)

Also, I will admit to a prejudice against trying to align multiple languages with multiple ways to assign meanings (this was — and is still — a big problem with formal proofs of regular programming languages).

So, I’d argue for trying to design single languages that can serve the dual purpose of being “good to program in” and that automatically try to verify the intentions of the programmer.

I think this is a really interesting and important problem for computerists to work on. If we think in older terms — the spectrum between assembly/C type languages and Lisp/FP type languages — the latter work better, but I think in the end are too “pristine” and not very good at systems — and the former are not expressive enough and are full of pitfalls.

Some form of “real AI” is likely to be too annoying to really help, but it’s hard to imagine pulling off the goals here without an expert system of some kind to help.

Going way back, Ivan Sutherland’s Sketchpad continuously showed you the consequences of the intentions you expressed. A lot of the time, this would lead to finding the requirements you were trying to express, winding up with what you wanted. But some of the time it would lead to guessing and superstitions and “poking at your system with a stick”.

An interesting wrinkle here is that Sketchpad partially succeeded by including the idea of “engineering/scientific tolerance” — its inferences were not exact, but the processes were set up to minimize global error measurements. This allowed it to have inherently non-linear relationships, etc.

So: an extremely important “Grand Challenge” for (real) Computer Science and (real) computer scientists!

What is the importance of Christopher Alexander’s work for system/software designers?

I think the most useful of the Alexander books is the first one “Notes on a Synthesis of Form”. It was his PhD thesis at Harvard, and has thought provoking discussions of systems and design.

I don’t think his books are useful as direct guides to how to go about things in computing — and that the popular “Pattern Language” books are a bit misleading when attempts are made to apply them to computing.

In NOASOF, his main example was to try to design a new village in India by gathering many hundreds of constraints of all kinds — physical, biological, social, aesthetic, etc. — and see how to wind up with a coherent design. This thought process — including ideas in the book that work well (and those which don’t) — is invaluable for computerists to go through.

Worth the effort.

---

Petre Dămoc asked such a good question as a comment that I’m copying it here:

What about his magnum opus (Nature of Order)? Especially the 15 principles described in Book One. My intuition is telling me that these principles are extremely important for interaction design.

My reply was:

I should have mentioned these four books. IMO the best part of these books are very like the best parts of NOASOF — extremely valuable to motivate one’s own pondering.

I think of them in the same vein as I do Julian Jaynes’ “The origin of consciousness in the breakdown of the bicameral mind” — i.e. probably “not ‘true’ “, but invaluable for provoking thought.

My advice here is to take the forms of these thought provoking books and — instead of following the contents — make up new principles to see how the idea space can be covered (and it’s certainly OK to use some from the ideas of others).

My favorite modern “computery thinker” is Bret Victor — partly for his ideas per se, but also that he has been able to use what was valuable from the past, yet to recast the past in the light of where we are right now and his own high quality instincts, to indicate how we should be probing into the future. I don’t think that Bret has missed much that is valuable, but what he does is still “New” (rather than incremental “News”).

---

Prompted by a comment of Ali Krgn, here is an overall observation:

Architecture is definitely a good source of analogies for other subjects, including computing. However, I very much prefer “comfortable surroundings” to “comfortable ideas” (I’d rather be around “uncomfortable ideas” much of the time, but would not like this much in my surroundings). This is why I take C. Alexander with grains of salt — too much analogy can start missing the main points by miles.

What is the difference between soft computing and nano computing?

I had to look up both of these terms (neither was in use back when I was most active).

You could look them up yourself, but as long as I’m already writing …

(a) “soft computing” is defined as “employing the kinds of thinking methods used by humans and other animals”. This used to be called “heuristic methods”, “fuzzy logic”, etc.

(b) “nano computing” is defined completely differently: the use of very small components making a very small computer. (I thought it would relate to some modern version of “microcoding”, but apparently not.

How much RAM did computers typically have when you first started using them?

I realize that I didn’t completely answer this question. I’ve now split it into 2 parts: [1] my personal experience in 1962, and [2] the more general state of things in the US at that time.

[1]

The first real computer that I programmed as a job was the IBM 1401 in 1962. It had 8K* 6-bit characters (each character had a 7th “word bit”).

because it was a decimal machine, 8K meant 8,000 not 8192 …

I have written elsewhere on Quora about the 1401— but Quora search is “kind of abysmal”. I have answered less than 700 questions, and one would think that I (or you) should easily be able to search them …).

However, until I locate these **, here is the Wikipedia article. It is pretty informative, but lacks:

(a) that the 1401 was originally an “outlaw project” done in a back room, then grudgingly sold, then becoming spectacularly successful,

(b) our experiences (and meta experiences) with a small one in the US Air Force Air Training Command computer center in the early 60s,

(c) mentioning at least one truly significant work of meta software *** done on the 1401.

https://en.wikipedia.org/wiki/IBM_1401

** Thanks to Mark Cleary here is one of my previous Quora answers about the 1401:

https://www.quora.com/What-was-it-like-programming-an-IBM-1401-back-in-the-60-s/answer/Alan-Kay-11

*** One of the wonderful “meta things” done on the 1401 was Val Schorre’s 1964 “Meta II” system at UCLA, a compiler-compiler that would run in an 8K 1401. I wish I had known about this when I was a 1401er, but its invention happened after I went back to school. It will be very worth any computerists’ time to read this short but loaded with information paper from 1964.

Meta II was a very clever way to use recursive procedures to parse grammars that were complex enough to describe programming languages and to emit translated code on the fly. The paper includes Meta II in itself (all in just a few lines), two complete examples of algebraic type languages, and a complete description of the virtual computer you need to make to run the examples.

If I’m not mistaken, I recall that Val’s 1401 at UCLA also only had 8K characters.

Doug Engelbart’s “Mother of all Demos” was programmed in a language — MOL — that was generated by a variant of Meta II (called “Tree Meta’).

[2]

A typical “scientific” computer ca 1962 was the IBM 7090. I’ll quote from the Wikipedia article:

The first 7090 installation was in December 1959.[1] In 1960, a typical system sold for $2.9 million (equivalent to $23 million in 2023) or could be rented for $63,500 a month (equivalent to $501,000 in 2023).

The 7090 uses a 36-bit word length, with an address space of 32,768 words (15-bit addresses).[2] It operates with a basic memory cycle of 2.18 μs, using the IBM 7302 Core Storage core memory technology from the IBM 7030 (Stretch) project.

I never wound up having to program the 7090/4 series. But it could be fun here to compare it with its biggest competitor at that time: the Control Data 3600, designed by Seymour Cray.

A typical system sold for $1,200,000 ($9.5M today — so less than half the IBM 7094), was a 48 bit machine (and could have a max memory of 256K words), and was a bit faster.

I wrote quite a few programs for this at the National Center for Atmospheric Reseach in Boulder, starting in 1963. IBM called Control Data “Brand X” (possibly because it was so good they didn’t want to even mention the name).

This was developed into a whole line of 3000 series machines, and CDC made a lot of money during this time period.

https://en.wikipedia.org/wiki/CDC_3000_series

What happened to the "parc copies" created by Xerox at PARC? Have they been preserved or recycled?

I wrote an answer to an equivalent question:

https://www.quora.com/Was-anything-meaningful-lost-for-good-when-Xerox-threw-out-their-old-PARC-data-disks

What is the origin of the name "The Internet"? Was it chosen as a play on words with "interconnected networks"? Were there any other names considered before "The Internet" was chosen?

JCR Licklider — an experimental psychologist and (ca 1959/60s) fan of interactive computing — was given a lot of money by the DoD to set up a long range computer research community. One of his first memos looked like this:

image

There is a good story (I’ve told it elsewhere in Quora) about why he called it this.

The first pass at this was a large experiment in packet-switching: the ARPAnet. This was instigated by a successor of “Lick” — Bob Taylor — and made to happen by Larry Roberts.

What should be done next was pretty unanimous, and was started by Larry Roberts. It was generally called “Inter-networking”. You can guess what this was shortened to eventually.

Xerox Parc actually had a working inter-networking system that spanned the US before the official Internet — it was called “PUP” — for “Parc Universal Packets” — and Parc researchers were part of the larger Internet design committee whose main force was Vint Cerf (but which was a truly amazing group effort of top designers who wanted to work with each other).

I would give Steve Crocker the main credit for the way the networking research community was organized — and so would Vint (this came up when we had dinner a few weeks ago).

---

Looking at some of the other answers has prompted me to say a bit more here. The ARPA community did not regard the ARPAnet — which sent its first packets in 1969 — as an “internetworking” scheme. But, as mentioned above, the desire/plan to develop internetworking was already in the air.

Both Bob Kahn and Vint Cerf worked on the ARPAnet, and in the early 70s started to produce proposal versions of what was to become TCP/IP.

Thanks to Steve Crocker and the way the network design process was set up within ARPA, there are a series of “Requests for Comment” documents that provide a wonderful history of the ideas. (“Request For Comment” was thought to be less aggressive and more friendly than “Requests For Proposals” that were the norm in DoD.)

The Ethernet was done at Parc ca 1973 to connect computing resources there, and very quickly some of the Altos were programmed to be what we would call “routers” today, and were used to internetwork Ethernets, the ARPAnet, etc. The protocols were called “PUP” (as mentioned above).

There is also a somewhat funny story told by others and myself about how the first wireless use of the ARPAnet (using an old Ford bread truck full of gear) was used to transmit reports to Washington DC from one of Palo Alto’s outdoor beer gardens. This was soon used to connect to Don Davies’ packet network at the National Physical Labs in the UK. Here’s a talk I did that summarises a fair amount of the work back then:

https://youtu.be/wdHtYW_wcAs?t=260

How would Alan Kay go about designing a systems design course? What contents should be covered, and what knowledge or skills should one gain? Which exemplary projects/papers should be included to facilitate understanding?

In the true fashion of the 1960s-thru-1970s ARPA-IPTO research community (which included Parc), I would try to find the best people who could do this, and help find funds to support them.

I would try to help form a joint vision large and comprehensive enough to excite and guide.

Compared to what I think is needed, there would only be a few ideas that I could help with (and I think most of the central people we’d get as PIs would already have them).

One big notion and set of questions would be how to go about helping all of us — especially the students — internalize a much better “systems consciousness” (i.e. better “feelings” and “guesses” about systems, etc.).

I started to “think systems” from getting a degree in molecular biology, and first being fascinated by what was called “Embryology” back then (now “Morphogenesis”), then Biochemistry, and then when Watson’s great book “Molecular Biology of the Gene” appeared in 1965. At the time I was working my way through college as a programmer of super computers of the day for the National Center For Weather research — I didn’t know anything important about computing or computing systems, and there was some, but little, overlap with the weather models and computer systems, and systems in general.

Today, there are a variety of ways to use computers to get started on “thinking systems” — especially in both cognitive and visceral ways. I don’t think I know all of them at this point. There are “massively parallel simulators” (like StarLogo, etc.) in which it is easy to set up simple conditions that generate complex non-linear behaviors.

The Apple “Vivarium” Project was aimed at helping children learn to make and view the world via “systems glasses”.

There is Bret Victor’s Dynamicland which creates a wondeful system’s world in which one is embedded and can influence in easy and highly influential ways. I would definitely try to create early experiences and projects of many kinds using this kind of media.

One of many threads running though these ideas would try to help learners transition from normal commonsense (and wrong and weak) notions of static stabilities to the large idea that any stabilities are actually dynamic stabilities, and thus processes of some kind are necessary to keep desirable propertes within tolerances, etc.

And then there is scaling …

And then there is …

Etc.

Basically (I think) there are many things that would be good to do and learn before computer systems themselves become the object of study. Since much of the early ideas would also be reified using computer systems, being able to later “pop the hatches” of these could be a great start, etc.

There would be much more to this — and far beyond the scope of Quora (and my personal scope). I should have added at the beginning that I would normally think about this for months before proffering ideas — but the fun (and pitfall) of Quora is to see what can be done as a single stream of response.

It would be a great “grand challenge” project!

What is Alan Kay's perspective on the obsession with learning memory management and other "low-level" aspects when it comes to programming education?

I react to this question in the way I react to questions that are essentially about “how much of a subset of something important can we isolate as separate teachable subject before we destroy the connection to the “something important”?”

For example, to what extent can you isolate “reading and writing” and even “literature” from ideas?

It is not unlikely that most educational systems in the world do just such a separation — and, if so, I don’t agree with it at all.

Programming is often quite isolated from systems design (and from systems themselves) because it is often taught in isolation as “data structures and procedures and algorithms”. I think this is quite a bad approach. (And systems are not the only larger ideas that need to be understood and heeded e.g. integrity and “before the ‘fact’ engineering” *)

The question before us falls into the really interesting category of “runnable math” — meaning that we’d like to think of most programming and systems design in terms of their “relational requirements” rather than what is often needed to convert these into runnable and usable systems.

I’m old enough to have been brought up in a mathematical, scientific, and engineering world in which part of the years of learning were spent on really understanding and being able to use fundamental principles and materials both when needed and to ground any higher levels of understanding.

My prejudice is definitely that this is still a very good — imperative — idea. It is still being done in the established fields of maths, sciences, and engineering.

However, there have been explosions of knowledge in many fields — including computing — so it is definitely worth pondering how to allocate our learning time to trying to understand much larger fields.

I have written a few Quora answers praising the Nissan and Schocken “from nothing to an interesting program” approach.

https://www.amazon.co.uk/Elements-Computing-Systems-Building-Principles/dp/0262640686

These days I think I would try to work with them to take a second pass at this wonderful work they’ve done. (This stuff can’t be praised too highly, but I think an even larger perspective and a fresh pass would yield great dividends.) For example, the approach to the HW could be made a fair amount simpler without losing an rigor, and the goals for the SW part could be greatly expanded without requiring more than a bit more total bulk of code.

That would be a really interesting project, and also a good way of answering the slightly different and larger question that could have been asked here. I.e. say “Yes” via radically changing what it means to “learn to program”.

---

“before the ‘fact’ engineering” is a phrase coined by Margaret Hamilton, who was head of Apollo Guidance Computer mission SW (everything thing except for the OS). The ‘fact’ referred to any kind of errors, especially crashes. The idea was that the engineering’s main goal was to prevent ‘facts’ from happening during a mission.

The evidence is that there was never a SW failure over all the Apollo missions, despite the enormous limitations of both the HW and SW media they were required to use.

https://en.wikipedia.org/wiki/Apollo_Guidance_Computer

What does Alan Kay think of Ted Nelsons criticism of Xerox PARC? (Taking away the "right to program" by hiding everything behind the "PUI" and making computers into paper simulators with single-collumn text)

Ted has done a lot for personal computing, but occasionally gets things wrong. I got invited to give a talk for his 70th or 75th birthday, and decided to use a 1978 Xerox Parc Smalltalk system to show what we really did, and to praise him at the same time. The talk starts around 2:15 if the URL doesn’t take you directly there:

https://www.youtube.com/watch?v=AnrlSqtpOkw&t=136s

What do you think is the future of software engineering? Could advancements in technology eventually render it obsolete, or will these advancements serve as valuable tools that enhance the work of software engineers rather than replacing them?

Here’s part of a slide I used in a talk in 2021 expressing a kind of “Hippocratic Oath” within Engineering:

image

The first two have been around for thousands of years. The ones for airplanes and software more recently. It’s worth noting that in every era there have been people — sometimes even engineers — who are willing to bypass these for various forms of gain and/or convenience.

There is also the problem that only parts of “Software Engineering” resemble “Established Engineering” — much of what is called “SE” is considerably more ad hoc than the term demands.

An historical perspective on Engineering will reveal many changes in what pro-level engineers in each era do, and what kinds of tools they learn and use. However, the “Do no harm” ethic has been quite constant over time.

We also have to take note of the deep human trait of normalizing what is generally going on in each present time. For example, it is big news when a plane crashes and kills several hundred on board, but in the US in 2022 there was an average of 116 traffic deaths a day from vehicle crashes alone. Much worse, there is an average of 1315 deaths per day from smoking. This is not only not front page news, it is not even reported in most news media.

This means that a deep danger in the future of engineering would be a “normalization creep” downwards to levels of harm not considered acceptable today. We can see hints of this in many parts of computing, especially “social” media and so-called “AI”. These are quite dangerous and harmful in their current form, but most of the population, including the engineers involved, have normalized that causing harm in this and other ways is quite acceptable. (And, of course, we can see many other examples of technologies invented and produced via engineering that are gradually turning the planet uninhabitable.)

Another deep human trait is to try to avoid paying the full cost of most things, and especially to try to amortize costs not just to the future, but to future generations. For example, most infrastructure objects — such as bridges — are costed for constructing them, but not to replace them or do major maintenance 50 years or so hence. The software industries have taken up this unfortunate behavior all too whole-heartedly.

We also have to note that the level of expression that is used to create software has not changed much since the 80s — and that this is in great contrast to the progression of levels of language that happened in the late 50s to the late 70s. Meanwhile, “Moore’s Law” at the components level continued exponentially, and without languages/development environments, approaches, etc. keeping pace (and not even keeping pace with the development and use of tools in the more established engineering disciplines — ironically using computers for CAD and SIM in ways that the computing field itself shunned).

My guess is that the two main culprits were (a) rampant commercialization starting in the 80s, and (b) the lack of “level of skills” and ways to determine these that were analogous to the medical and established engineering professions.

When Facebook took itself off the Internet for a day a few years ago (including its own offices), I started poking around to see what led to this. I found that Facebook essentially does “unit tests” via thousands of configuration checking programs whenever a change is made. There was not a test that would say “you must not partition the Internet on purpose”, etc.

Note that trying to write thousands of individual laws that together capture the larger intent of a process doesn’t scale, and in the end is a terrible approach. This is similar to the Boeing 737 Max autopilot — which also had thousands of “laws” and responses. However, they didn’t have what John McCarthy called “commonsense” knowledge that most humans use to bridge between the specific and the general. Facebook code didn’t have any sense that the larger goal was to be on the Internet 24/7, and the autopilot didn’t know that it had people to protect and that diving towards the ground to correct a stall at low altitude is not to be done.

What is quite sad here is that Moore’s Law did come through with the kind of scaling needed to handle these larger issues. But the ideals and perspective for what “real AI” (and “real programming in the 21st century) need got lost when a subset of the “real AI” problem (fuzzy useful correlations with little meaning) replaced “real AI”.

——— added June 19 2024 ———

Here’s something recently, very dangerous, and to this point:

https://www.bostonglobe.com/2024/06/19/metro/state-911-outage-firewall/?s_campaign=breakingnews:newsletter

Yikes!

---

I reckon that we’ve now lost almost 40 years of needed development of “higher level expression with integrity of meaning”. Doug Lenat’s CYC system had many of the right ideas, but needed to be redone a few times to reflect learning curve effects on how the architecture should work.

Here’s one way to think about what is needed. It has analogies to classical engineering, but is of much larger scope. It also has analogies to “societal engineering” but constitutes a very large step from where we are.

The center of what is needed is a set of processes that can represent most of the important meanings and goals in our century, and to be able to do it in ways that the validations of meanings can be understandable to we humans. Engineering has done this with various kinds of standards, including physical ones that represent the meanings of core concepts. Some societies have done this with carefully written relatively small constitutions, against which specific laws and actions can be tested. (Note that there are too many degrees of freedom for “case law”, so determining “legality” almost certainly has to be done dynamically.)

To my mind — maybe it is now just the mind of an old fogey — this returns us to many of the ideals and visions of the early 60s. This time around, there are enough resources to do a much more complete job at “Software Engineering with Meaning and Safety”. (It won’t look much like what is called Software Engineering today, but that is much of the point here!)

I’m currently a teenager and I started looking at philosophy. I want to start taking a dive into it but I don’t know where to start. Any certain topics or books that I should look into?

A good and somewhat tricky question …

First, what does philosophy mean to you?

I think of it as the kind of deep thinking that one has to learn to do when science, etc., no longer serves — but you will have your own slant on this. The underlying notions of science will still help thinking about issues that don’t lend themselves to empirical experiments.

The contrast between maths and science can also help provide some grounding to the use of logic and representations in language. Einstein pointed out to fellow physicists in 1921:

“As far as the laws of mathematics refer to reality, they are not certain; and as far as they are certain, they do not refer to reality.”

The larger meaning of this is that reasoning in language can express “certainty” but this is only about the logical entailment of the conclusion, not about “the world outside our heads”.

Another path to be negotiated is the one between learning what others have thought in the past (and are now thinking) and the central point here of learning to think your own thoughts in deeper and more fruitful ways. You will need to build up a lot of both.

One way to do this is to note down questions as you read the ideas of others (this is better in the long run than e.g. simply underlining and learning the ideas of others).

It will help you (a lot) to write down your current burning questions

I like “carpet bombing reading”, but sometimes a single “slim volume” can be a good way to get started. A good one — by Thomas Nagel — is “What Does It All Mean?”. You will find Nagel’s approach to getting into philosophy similar to what I’ve been advocating — I particularly like the introduction of this book: to me, it is the essence of how to start thinking about “difficult stuff”.

A few generations ago, Bertrand Russell wrote quite a few books for the general public about ideas, and presented them in a genial good natured but not trivial way. His book “The problems of philosophy” could be a good start.

In any case, the first cousin of philosophy is bullshit. This is both useful and a snare. The trick is to use bullshit to speculate, but then not to get caught in your own waste products! (e.g. by believing the bullshit). As with the invention of science, finding other ways to deal with ideas other than simply “believing” them is the key to making real progress.

Good luck, and have fun!

Can you provide some examples of a "thinking machine" that existed before computers were invented?

LexisNexis defines “machine” as

A machine is any system structure that can be used to complete a task by utilizing energy.

Lewis Mumford used the term “megamachines” as part of his approach for understanding and analysing societies — and pointed out that most societies were in fact “megamachines”.

“Artificial” — according to the Oxford English Dictionary — means:

made or produced by human beings rather than occurring naturally, especially as a copy of something natural.

With all this in mind, let me propose modern science as not only a “thinking machine” and an “artifical intelligence”, but as the most powerful and successful manifestations of these ideas so far, and in history.

I think this is not only quite lost on most of today’s computerists who work on so-called “AI” (which is even very far from the original definitions and larger understandings of the original founders).

For example, one of the keystones of “real science” is not only that internal deductions and inductions are insufficient for doing science but that the whole enterprise — including empirical observations and studies — are extremely difficult in general, and for the human mind in particular.

This is partly because — as Francis Bacon pointed out 400+ years ago — the human race is not set up for thinking clearly (a) genetically (b) culturally (c) linguistically (d) academically. “Scientia” back then meant “gathering of knowledge”, but Bacon pointed out that the results of the four “idols” above that humans “worshipped” led to the “gathering of distorted and quite wrong ‘knowledge’ “, and that what would very well help matters would be to invent a “new science”: new methods and new organizations that could partly get around our biased brain/minds, cultures, languages, and teaching.

What we call “real science” today is the “new science” that Bacon called for.

There are too many facets of “real science” for the size of a Quora answer, but let me point out that “real science” requires a society of scientists who are willing not just to explore in new ways but to test out the ideas in new and tougher ways (since most humans have “confirmation bias” which makes it difficult for the formulators of theories to test them deeply enough (so other scientists often do a better job of testing, etc.)).

(Note that funders often have considerable confirmation bias, etc.)

If any attempts are made to “do AI on a computer”, they should take into account what “real science” has found it necessary to do to generate necessary levels of confidence in the results. Without this we have a combination of confabulation with credulity — often associated with belief systems, such as religion — that has been really dangerous over history, and especially dangerous today.

What would be the place of AI in Doug Engelbart's vision of 'Augmenting Human Intellect'? How can we reconsider things within current AI 'developments'?

It’s very worthwhile to read Doug’s original plan in 1962. They could only work on a subset of this, and ca 1970 they wound up being identified with the subset (this often happens).

Find this version of the plan on Engelbart’s website (hint: type “Engelbart” into Google).

image

Below is the summary of the large scale plan. You can see in the red box a mention of various technologies they intended to use — and in the large scale plan itself, you can read how they planned to use “AI” (you need to remember that many terms — such as “AI” — had a very different meaning in 1962 (62 years ago!) and they certainly were not thinking of such limited and counterproductive technologies called “AI” today (such as ML, LLM, etc.)).

image

In talks that people have posted to YouTube, I’ve explained some of the big ideas from this deep group of researchers.

One big distinction in the 1962 time period is that they thought of “machine intelligence” as being a kind of complimentary set of thinking tools that could be “symbiotic” to how humans were able to think. They were not at all thinking about something like a slave or a major domo, but something more like a research assistant or a “Memex” (the latter was a big influence on Doug’s thinking).

In the very late 60s the “official AI researchers” started to think that something like “intelligent Greek slaves” were needed for the “Romans” (Americans), and became rivals to Doug’s notion of elevating human thinking rather than just elevating power. This was a bad idea then … and it’s a bad idea today.

---

For completeness, here are the references for this 1962 plan. Also, Ivan Sutherland and his work on Sketchpad are mentioned in this document, but there was no reference because Ivan wrote his thesis in Jan 1963.

image

image

How do I design an effective orientation program?

I don’t understand this question — could you expand in a comment? I’m not sure I can answer it, but right now can’t tell.

Why are object-oriented classes named classes? What is the etymology of object- oriented classes?

The second version of Simula (Simula-67) used the term “Class” for the descriptions that could be instantiated into “objects” which carried out the descriptions. Some of the background for this term very likely came from its use in mathematics.

The history has some points of interest. For example, the first version of Simula (Simula I) used the term “Activity” for the descriptions that could be instantiated into “Processes” which carried out the descriptions. Ivan Sutherland’s Sketchpad system predated Simula, and used the term “Master” for the descriptions that could be instantiated into “instances” which carried out the descriptions.

Other terms for “that description which can be instantiated” have been used over the years. For example, note that the “Processes” in multi-processing/time-sharing OSs are actually instances of a special kind of restricted virtual machine which the OSs supports and protects.

It is also worth noting that in both Simulas the description (class/activity) is not an object. A language in which every description is an actual object — e.g. Smalltalk — turned out to be a very useful idea.

If you had only 6 months to make a product using code, how would you do it? This question is inspired by Peter Thiel's approach to time management.

As posed, this question doesn’t have enough information to attempt an answer.

How can I convince myself that “real OOP” is a better way to build systems? What’s the reason other than Alan Kay’s “god-given vision” — Is there any math behind these ideas or are they just a bunch of “good-looking” ideas with "cool" rhetoric?

“Interesting” question, and “way to phrase it”.

First, I doubt there was any “god” behind my “vision”. And why would “math” make you feel any better if you are asking about “better ways to build systems”? (That is the province of engineering, which only part of the time can use math to help, and sometimes has to get by “workable heuristics”.)

And then, a key question is whether my almost 60 year old vision holds today after factors of a few millions of Moore’s Law?

I always start off by doubting “things held to be true” in order to help thinking. And I certainly think that what I thought in 1966 should be suspect in 2024.

The basic idea was that a really simple way to think about systems — that was already in use — was the analytic idea of identifying subparts and the communications between them. My little idea was that this could be modeled as “little complete computers” on a “network” communicating by “messages”, and this could be “mathematically neat and complete” if every entity (including the parts, network and messages) was in the form of “little complete computers”.

The ancillary observation was that this would take a lot of software and hardware engineering to be made practical enough to work.

This is what I called “OOP” back then.

Trying to look at today, I think the ways of trying to understand systems by decomposition and build them by synthesizing organizations of parts are still the main ones (and are still problematical — but likely to still be needed and useful).

However, I think the practical tools for understanding and making systems have not kept up with the immense scalings that have happened over the last 60 years or so. (Something worth pondering is that a qualitative scaling should also provide more computing power for better ways to intercommunicate, and thus better ways to “send and receive messages” — and this means much better ways to decompose problems and model better solutions.

Depending on what you think “Real OOP” means might help your first question. If you think it means “the way we (Parc) e.g. built systems in the 70s”, then only a few of those principles still hold, and better ways are needed. If you think what “Real OOP” should mean (in this day and age), then I think you need to help invent the practical forms of what is actually needed today.

I think it will be hard to actually understand any complex thing without *modularization” of some kind (and this will require techniques for “intermodule communication and coordination”).

It’s up to you to decide whether what I thought about 60 years ago has any bearing on this today, and whether it was a kind of “applied algebraics and logic to practical problems” (along the lines of how John McCarthy thought about things).

It has been said that Windows is Bill Gates way of helping Seagate to sell ever larger hard disk drives. If that is true, then I claim that Smalltalk is Alan Kay's way of helping Intel sell ever more computationally capacious microprocessor chips?

First, let me copy the OP’s answer to his own question:

Bill Buckley:

The problem with this post, which I placed, has much to do with Quora’s limit on question size. So, I will try again. Dr. Kay, is it unreasonable for me to have said in my Graduate Seminar course (for the MS Computer Science), “It has been said that Windows is Bill Gates way of helping Seagate to sell ever larger hard disk drives. If that is true, then I claim that Smalltalk is Alan Kay’s way of helping Intel to sell ever more computationally capacious mircroprocessor chips?”

---

The top question is complex linguistically. The first sentence looks to be a “proverbial metaphor”, but the next phrase is “If that is true …”.

It isn’t “true”, so the rest of the question — a deduction — doesn’t follow as the writer would wish (of course, implications don’t require their preconditions, but normal usage is “if and only if” and these do).

Now to address Bill Buckley’s comment.

I would offer that “yes, it was unreasonable for you to say that”. The largest reason is that I never was interested in helping Intel do anything but start to create “emulation processors” for dealing with a variety of very high level languages (e.g. much higher in expressive power and safety than C) that both existed and needed to be invented.

I viewed Smalltalk as “something good in its timeframe and scaling”, to be superceded as soon as possible.

The opposite of my aims actually happened: (a) Intel stayed with extremely poor architectures and tried to make them faster, but not better, (b) Smalltalk did not get materially/qualitatively reinvented, and is still around today in close to original form.

What is the Chronology of Early OOP? (see self-answer for details)

There’s “early” and “Really Early”.

“Really Early” mostly happened in the 60s (with a little continuity back into the 50s).

My best recollections are in a history I wrote that was requested in the early 90s by the ACM “The Early History Of Smalltalk”. A version of this (without the Appendices) can be found on Bret Victor’s site: The Early History Of Smalltalk

The notion of data structures containing pointers to procedures goes way back. I mention one example in this history that was in use when I started programming in 1961. Also, data structures with multiple fields were commonly called “objects” (procedure pointers or not) — so I made a bad choice in 1966 calling this new area “object oriented programming” (if conveys a poor image of the idea).

If I were to do another pass through this history, I probably would have said more about Doug Ross (whom Ivan Sutherland mentions in his 1963 thesis). Doug was an early programmer on Whirlwind in the 50s and one of the first to write digital graphical simulations (bouncing balls, etc.) that were common on analog computers (these were partly to show that Whirlwind was really fast — it was one of the earliest — maybe the first — bit parallel computers).

Two big ideas that were being thought about early on in different areas of computing were (a) “instantiation” and (b) that “meaning” had to include dynamics.

For example in the data base world there was the notion of unit records of a particular arrangement of fields (initially on punched cards). One path was eventually to the Data Division of COBOL and what led up to it (this predated scientific computing by many years e.g. Algol 60 only had numbers, strings and arrays — “records” were in the future).

WIth all due respect to Doug Ross, it was Ivan Sutherland ca 1961–2 who thought about instantiation of dynamic “objects” — this for the purpose of creating what is arguably the first real interactive graphics language and system: Sketchpad.

Just as interesting, and a tad earlier, we have Bob Barton describing the architecture of the Burroughs B5000 — which also was the first to have its OS completely written in a High Level Language (initially an extension of Algol 58, then Algol 60 (called ESPOL): A New Approach To The Functional Design Of A Digital Computer.

In this case the hardware was more higher level than any of the languages of the day — it could be thought of as a “Simula I” machine but before Simula. It included both instantiation and “meanings as including dynamics”, etc.

Instantiation from a “Master” (Sketchpad term) or an “Activity” (Simula I) term (both of which are now called “Class” (from the later Simula 67), does not require inheritance chains to be extremely useful. The “Master” establishes a category that defines generically how the instances should behave and appear. Sketchpad allowed Masters to be made from any composition of instances, and this was very powerful.

As the Smalltalk history explains, a lot of early ideas about “processes” were combined into a single definitional idea for a new way to think about “objects”. Some of these additional ideas included the kinds of instantiation into “virtual machines” that multi-process and time-sharing OSs were starting to do — this is a great way to start thinking about the ideas, but the overheads in having processes be “large” had to be eliminated.

Pure math — in “algebras” etc — uses ideas that are like “inheritance” to get both power and logical coherency — so the notion of “inheritance” appeared early (e.g. in Simula 67).

I think I explained in the Smalltalk history that the Lispish late-binding approach taken by Smalltalk-72 allowed many different inheritance schemes to be explored (including the very useful “slot inheritance” adopted in some of the AI systems of the day (e.g. Pat Winston’s thesis).

The original plan for the Alto computer at Xerox Parc was to replace it with a much faster one in a few years to take advantage of Moore’s Law, etc. However, Xerox did not want to pay for many new machines (they were already complaining about the 100s of Altos we were making). This put a huge squeeze on researchers (like us) who were depending on HW capacity to stay very late bound.

The need for faster and larger on the same machine inspired Dan Ingalls to design Smalltalk-76, which was more of a compromise between the Lisp and Algol worlds, but was a brilliant design and not much flexibility was lost (and it was almost 200 times faster than Smalltalk-72). Part of the design was the decision to use (and be really careful with) single inheritance for the system (but to still try experiments in what should Masters look like?

This worked out well because we were primarily trying to invent a wide-spectrum usable form of “personal computing” rather than a programming language (and this helped all around). Smalltalk worked really well at the scales we had back then. But it was clear that e.g. “inheritance” was going to be tough to scale, and a workable multiple inheritance needed to be more than ad hoc ideas. For example, MIT’s Flavors were a perfect double-edged sword and very difficult to handle (and much larger in code than the entire Smalltalk system at Parc).

Basically: scaling really called for a “knowledge based epistemological design” that would require a lot more than just combinations of inheritance and compositions and mixins, etc. We considered this necessary, but outside the scope of our research.

How can anyone who is intelligent believe in god?

David Moore wrote a good answer:

https://qr.ae/pswpat

Once we realize that our biological brain/minds are (a) set up to remember and believe, and (b) all of this activity is internal, we can also realize that (c) what we think is “reality” is actually our own internal theatre playing back combinations of our beliefs and sometimes tiny signals from outside us. This means that we are always “dreaming” (and we hope our waking dreams are better maps of what’s going on than those while we are sleeping).

Using these terms we can see why David Moore uses the term “hallucination” — and we should also be able to see that the best science can do is to make better “maps” — “representations” — of what more careful investigations can find out.

Because scientists are humans, their brain/minds also believe — and good scientists will try to include the active idea of “this is a belief” as a label on their beliers.

The term “god” in the question is vague: Wikipedia declines a simple definition such as “an entity superior to human beings in many — if not all — ways”. Instead, the article loads the term with many historical perspectives, etc.

Long ago I majored in Molecular Biology — and still find it endlessly fascinating.

One conjecture that MolBio gives rise to is that it is entirely possible — using just the biology of this planet — for there to have been or be “an entity superior to human beings in many — if not all — ways”.

I think most intelligent and knowledgeable people would be able to believe in the above sentence. It is about possibilities related to well known science and weak, but reasonable definition of the term “god”.

I don’t think there is enough evidence to actually locate such an enitity, or to actually believe in its existence.

I think this would/should preclude belief in actualities for stronger definitions of “god”.

However, one of the interesting pitfalls of science is that when we are on the trail of a conjecture, it is hard to not “sort of believe in it”. And, one of the largest stumbling blocks in trying to think is “confirmation bias” — and we scientists have this problem along with everyone else.

I think “conjecture” is the twilight area for most thinkers. This is where our “waking dreams” have the most trouble (and fun). The theatrical nature of our biological mind is most of the time — a great gift — and sometimes leads us astray. Frank Thomas, the famous Disney animator, once told me that they were worried that the audience for “Snow White” would not be afraid for her when threatened by the huntsman (she was just a series of drawings). But there was no need to worry, because — in theatre — the audience can get completely emotionally involved in wooden puppets, and cry when one is “killed”.

Here’s a good one. Suppose we believe we are an atheist — and have supportive reasons into which we’ve put a lot of effort. But, how tough should we be as a member of the audience in a great play where belief in a god is the center of the driving idea? Can we give ourselves over to the play — for the sake of the play — and then come back to our main set of beliefs when the show is done?

The next night, in the very same building — also with great words being spoken and great music being played — we are now in a political rally. I don’t think we should give ourselves over to it, just because it is set up like a theatrical performance. I think we should really be skeptical and try to avoid simple believing — even — especially! — when we think we like the candidate.

The real question is where to put our story worlds. I don’t think we should try to do away with them — so much meaning is wrapped up in them. But, the advent of science shows that we can also have mental worlds — still kind of story-like, still with some believing going on — that are much clearer depictions of “what’s out there?”.

Shouldn’t we be teaching children how to live in and use their various mental worlds, rather than being trapped in them?

How do we program a universal bootstrapper that can run any meta-circular evaluator?

As stated, this feels like a “trick question”, because any Turing complete mechanism can be programmed to run any meta-circular evaluator. I feel that I’m missing something here. Am I? Please tell me more …

For example, the LISP meta-circular evaluator assumes that a LISP evaluator already exists. One question could be: how do you bootstrap a LISP evaluator from a Turing complete mechanism that doesn’t already have a LISP evaluator? That can be fun to think about especially if you’d like the result to look as much like a meta-circular evaluator as possible. Most of the Smalltalk systems did this quite nicely as well.

Val Schorre’s Meta II in itself was bootstrapped by writing it out in high level form and hand running it once to get the code for the VM to run (take a look at the first page of his Meta II paper). Alex Warth did a really nice job of making a succession of systems via getting an initial bootstrap going.

Etc.

——— Added 27 April 2024 ——— from the questioner’s comment below ———

Harri K. Hiltunen:

Remove the programmer. The universal bootstrapper program should work without human help. This is about communicating with alien computer programs without any programmers available. Earth sends Lincos followed by a human-like intelligent agent representative, other planets send other self-interpreters with their representatives, and there are billions of possible Lincos-equivalent self-interpreters being sent around the galaxy. The universal bootstrapper can run all of them without knowing what’s coming. The human is being removed from the bootstrapping for the purpose of, for example, making interstellar probes that can start communicating with their destination planets when in range, decades before arrival, without needing to waste decades relaying the alien communications back to Earth to be interpreted and converted into a software update to be sent back to the probe. This is also useful in the internet where ancient and newly generated programs are communicating with each other, trying to resolve broken or missing dependencies. If all programs have a universal bootstrapper available, they can prove to themselves that the offered services really satisfy the contract. No software could ever break in such an environment.

——— New Answer To Appear In Stages ———

I was not understanding the original question (as I guessed above) — I was puzzled because Harri always writes thoughtful comments.

This is a quite interesting question that I’ll have to think about for a bit. However, when Doug Lenat first did his thesis “AM” and then “Eurisko” — back in the 70s — some of what Harri is asking about did come up (and in the context of LINCOS). One question in two parts was (a) could a computer program like AM or Eurisko decipher the LINCOS sequence? And, just as interesting, (b) could a human being decipher the LINCOS sequence?

I have vague recollections that the (b) part was tried — and also —tried with some of the better successors of LINCOS (including the Stanford one I’m thinking about but can’t find with Google).

The really outstanding part of AM was the several hundred formal “heuristics for ‘interesting’ and following up ‘interesting’. And for Eurisko, was the inclusion of the heuristics in a form that Eurisko could not just use but do experiments on and improve as well. Doug wrote quite a few well rewarded papers on these two systems, and they are still very interesting today.

A really interesting intermediate computer system would one that could bootstrap just one good example of a LINCOS type sequence. Again, one would need to do what Doug did about “cleansing the heuristics” (for AM this was done by Doug getting both his friends and enemies to go through the heuristics enough times to “bless them” that they didn’t contain hidden math relationships of the kind that AM was supposed to be able to discover.

Now I will try to find the ellusive followup work, and ponder.

——— Added 27 April 2024 ——— First Steps ———

“Perspective/point of view/Context is worth 80 IQ points”

One of my favorite context stories was how Paul MacCready — in a few months — won the Kramer Prize for man-powered flight. He was driving across the desert in the 70s with his family, and thinking about his brother in law’s debt of $100,000 dollars (which he had taken on to make his wife happy). That morning he had seen that the British Pound was worth $2.50 and a stray thought got him to realize that the £40,000 Kramer Prize for man-powered flight was worth exactly $100,000 — and he said to himself, if I could win this prize, I could pay off this debt.

He had a degree in AeroEngineering, had been the world soaring champion a few times, and knew everyone who was working on this problem. But the prize was decades old and no one had been able to do it. He said to himself “These are really good people — why?” And then he said “The problem is that we don’t understand the problem”.

He told his gang when he got back to Simi Valley — “our goal is not to win this prize, but to invent a simple air frame that can survive 10–12 crashes a day”. This is because his expert friends would make an elaborately designed airframe, do one or two flights, crash it and then spend another year building the next one.

Six weeks after his drive in the desert, his group had made more flights and had more crashes that all of the other engineers in history.

Six months after the drive in the desert, the knowledge gained by the mainy flights and crashes created an air frame — The Gossamer Condor — that won the Kramer Prize. A few months later, an improved air frame — The Gossamer Albatross — won the big Kramer prize of £100,000 pounds for the first man-powered flight over the 23 mile waters of the English channel to France.

Bottom Line for this section: what are the smaller tests that need to be devised to get a much better POV on what the actual problem might be?

——— Added 27 April 2024 ——— The Cuneiform Tablets of 2015 by Long Nguyen & Alan Kay

This paper has several useful accounts of history, and an example of a possible preparation that could bring a program today back to an informative as well as a working life 1000 years from now.

We started by describing the Domesday book of 1086 AD. One of our comments: “It may be that our descendants shall know more of England in 1086 AD than of England in 1986 AD.”*

*This is because paper lasts longer than any digital media we have today — the Domesday book still exists in almost perfect form — and because we still can read the Latin in which the Domesday book was written — and this is because many important ideas were discussed early on in Greek and then Latin.

We eventually describe a bootstrapping scheme that involves a file of bits in which the first part is a simple bit-map (one bit/pixel) image of a page with a brief description — probably should have been in Latin rather than English (but we thought that a few English authors would be remembered 1000 years hence) — of a VM that any programmer could get working “in a few hours hack in an afternoon”. This would run the rest of the bit-map, which would be something like one of the PARC completely contained Smalltalk systems. This would have a working development system, and — since the simple VM might not be fast enough in the future (hard to believe) — the contained development system would also assist the programmer to make a much faster kernel if needed.

The actual “content-content” of the file would be objects manipulated and given meaning by the whole system.

1, If we take the “MacCready Sweet Spot Principle” into account, then our first attempt at Harri’s question probably should be to first make a bit sequence that can be given to human computerists who would then be able to make the intial VM without any advice in any recognizable language*. They could use any kind of programs (or none) to help figure this out from the bits give to them.

*Note: we know that humans can use pictorial languages for describing things (but we don’t know this about aliens). Should we try to do the bootstrap above without using any images?

It would likely use many of the techniques and ideas in LINCOS and “Astrolinguistics” to get from numbers to relations to transformations. This would be interesting to do and see what happens. Perhaps there might be a modern Henry Kramer to put up a prize? Will there be a “Paul MacCready” who will see the best ways to do all the experiments needed?

An important side note at this point is that there would still likely be a very wide gap to jump from a really difficult problem for certain smart humans to making something similar and much wider for actual aliens. If one reads the literature, quite a few very smart people have been thinking about the “communicating with aliens” problem for many years (centuries), and many of the most important issues and barriers have long been put forth.

I think a good parallel project would be to see what a computer program — like an “AM” — could do with messages written in LINCOS or its successors. In this case those who write the message would not be those who write the computer program to see if it can figure out — more or less from scratch — what is attempted to be conveyed?

Note that the social nature of human language (and the likelihood of its own bootstrap via cultural convention and “negotiation of even simple meanings in discourse”) make it a very poor vehicle for attempts to communicate “meaning without feedback” with faraway aliens.

——— Added 28 April ——— “CosmicOS” by Paul Fitzpatrick

I’m still just finding stuff from years ago that I already knew about, but have “vagued off” since. It is worth looking at the work of Paul Fitzpatrick — then at MIT — who took a shot at a real bootstrap from scratch to telling the aliens how to make an interpreter for a LISP like language.

There is an older page here:

https://people.csail.mit.edu/paulfitz/cosmicos.shtml

And a newer one here that seems to contain the latest entire message.

https://cosmicos.github.io/

Here is an explanation of the project (at bottom of page):

https://github.com/paulfitz/cosmicos

Here is Paul Fitzpatrick’s blog (go down to the end/beginning and work up to find another explanation of CosmicOS)

https://paulfitz.github.io/

I also found that there indeed (apparently) is a Kramer-like Prize, to wit: “the Russian billionaire Yuri Milner launched a campaign called the Breakthrough Initiatives, a $100-million dollar donation” (source Atlantic Magazine ca 2016).

I am surprised and disappointed not to find more (a) about what I used to know, and (b) what has been done in the decades since. I’m guessing this is a problem with Google, but still …

——— Added 29 April 2024 ——— after pondering ———

Paul Fitzpatrick chose to create a “single epic story” as his bootstrap. This is fun to read, etc.

However, my brain/mind is more or less back in the state it was when Long Nguyen and I — prompted by the recent “Christmas Project” initiated by Dan Ingalls* of getting the bits of “Notetaker Smalltalk” from a file that almost wound up on the trash heap going again — started thinking about 1000 year bootstraps.

Here we are thinking about “1000 light year bootstraps”.

Here is a ponder: the tradeoffs between McCarthy’s meta-circular definition and a Translator/VM definition, for example in Meta II by Val Schorre. The former only uses one language for both its own meta-language and its definition. Fun to read and figure out, but lacks clues to how to get it actually running on a real computer. The latter uses two languages: one to define the operators of an e.g. simple polish postfix byte-code VM interpreter (like the hardware of the B5000), and the second to create a compiler of the target language into a program using the VM’s operators.

The paper above that Long and I wrote — “The Cuneiform Tablets of 2015” (TCTO2015) — pointed out that such a simple machine could be described that any skilled programmer could implement in an afternoon — and that this would be enough to get the rest of the file actually running and participating in helping the aliens understand it.

I am possibly in the very same rut here, but this seems like the way to go for this project, and it should result a much simpler writing and testing. Basically, get a simple machine emulated, get the rest of the system going by itself, and make it so that the rest of the system is not just interactive but can negotiate meaning.

This would result in the “big file” containing something like the beginning of LINCOS — or more likely something like Fitzpatrick’s CosmicOS — but just enough to define a Turing complete dog simple machine such as the one in TCTO2015. Then have the rest of the system contain the next bootstrap steps and the image of the “negotiable agent”.

This also reduces the burden on testing and debugging the home grown AIs to see if human AIs can create the bootstrap VM from the bits without cheating. (This would have to follow some of the paths already trod by Doug Lenat and AM and Eurisko.)

I think this would result in a really fun and worthwhile “MacCready” phase for the larger issues.

---

Participants in the “Notetaker Smalltalk” project 2013-14: Dan Ingalls & Vanessa Freudenberg (leads), Ted Kaehler, Yoshiki Ohshima, Long Nyguen, Alan Kay

What do You think of Smallscript (https://www.codeproject.com/Articles/5373567/Smallscript-A-Smalltalk-Inspired-Scripting-Languag) ?

Have fun with it!

And see what you could do to allow systems to be made that are distributed over the Internet.

Also, these days, I think I would take a shot at various granularities of “pseudotime”, etc. (most things in sentence two above will need it anyway …)

Cheers

How does Alan Kay perceive the recent reconstruction of the Antikythera mechanism in relation to his "Cuneiform tablets" paper?

As with many other scientists, I’ve followed the analysis and reconstructions of the Antikythera mechanism for many years — and have been lucky to have had a number of extended conversations with some of the prinicipal investigators.

And, likewise with many other scientists, I’m very interested in “early thoughts” — their substance, shape, precursors, contexts, etc. — and how to tease something substantial from any kind of past record — whether far past, or recent, or yesterday.

My longtime great friend, colleague, and all around genius, Dan Ingalls, and I got very interested in how much “meaning and intent” one could include in a piece of software — rather than is the practice — even today — of having “meaning and intent” only be in the minds of the programmers, and sometimes in a few pieces of documentation or comments. For example, in the 60s and 70s, moving a piece of SW or a programming language, or a data base etc. a few 1000 miles, almost — and sometimes — required moving a programmer along with it. This was especially critical if the receiving end had a different machine, operating system, etc.

And, the famous “Lick” had pointed out in 1963, that future scaling and networks meant that a main problem in the future would be “communicating with aliens” (meaning it would be similar to bridging gaps between many quite different contexts and beliefs).

Earlier, Hans Freudenthal got interested in actually trying to “communicate with aliens” and had worked out a language and scheme — called LINCOS — to bootstrap such communication. There are a few even more interesting proposals that have been done since.

https://en.wikipedia.org/wiki/Lincos_language

“Bootstrapping” SW was a favorite topic in the ARPA research community — whose main aim was to go far beyond incremental improvements of the present day. Bootstrapping LISP is a really fun topic and endeavour, and the Engelbart folks really got into bootstrapping in the mid to late 60s. Another super fun example was Val Schorre’s “Meta II” paper, which included instructions how to bootstrap it, plus many examples of non-trivial programming languages made with it.

The origin of Smalltalk at Xerox Parc came from a “hallway discussion” about “tiny most powerful kernels” for languages, and I thought that if you took what was understood about systems in 1972 and “inhaled the fumes of LISP and Meta II” the right way, you could get something that was even more (quite a bit more) expressive in the same tiny size. I went away on a trip, and when I came back Dan had implemented my little example as a proof of concept — and it was extremely fun to play with. When the Alto started working in April 1973, Dan and Ted Kaehler immediately reimplimented it, and in just a few weeks, we had something to experiment with that had real expressive power.

Our group had an extreme learning curve, and this was aided and abetted by the quick reformulations of the whole system that could be done via its meta-kernel.

—— jump cut to 10 years ago ——

An almost destroyed Alto disk pack had a suspicious file on it that looked like it might be “Notetaker Smalltalk” (or “Smalltalk-78”). As a “Christmas Vacation project” Dan, Vanessa, Ted, Yoshiki, myself, and a few others, decided to see if it could be brought back to life. This was possible because most of the system — which included its operating system — was written in itself and the file contained a complete image “snapshot” of the system at a point in time.

This is described (in itself) in a video I did for Ted Nelson’s birthday:

https://youtu.be/AnrlSqtpOkw?t=136

And in some detail in a paper — Reviving Smalltalk-78 — we (Dan and Vanessa mostly) wrote.

This got us thinking about “what should a file look like to do the same bootstrapping by programmers a 1000 years in the future?”.

Long Nguyen and I did a number of investigations about ways to do this. We did not go the entire LINCOS route, but decided to see if we could describe in a simple one page bit-map image a tiny VM that could bring up the entire system on a supercomputer of the future, and then that system could help the future programmers implement a much more efficient kernel.

So the overlap between the Antikythera mechanism and the “Cuneiform Tablets Of the Future” is pretty small: it is mainly about making reconstructions in the future of “lost” interesting past mechanisms much easier.

Has Alan Kay started any activist movements to realize his vision of personal computing? What were the hurdles? The negative impact of capitalism and corporations, etc. What one can do as a political act?

This is not my nature.

But Doug Engelbart had some feel for this — and still, wound up not getting through to the larger organizations of society.

I think part of the problem is that in our current day (and probably most past days), any real inventions — even with great power — that required quite a bit of new learning, are bypassed by most (who tend to wait until there is something, no matter how tiny, that they can sell).

So the “mouse” succeeded but the best of his ideas were not only ignored, but not learned enough to even be ignored.

This used to piss Doug off. He use to say “The mouse is just a button on the car radio — we invented a whole car!”

Eeecks!

What does Alan Kay think about collaboration as it applies to Xerox PARC? What about the structure of PARC, and the people who were there contributed to such amazing collaborations and outcomes?

There are parts of the answer to this question in some of my other answers about Xerox Parc. (I would put in links here if Quora were easier to search — maybe some readers will help.)

The best book I’ve read about this research community is “The Dream Machine” by Mitchell Waldrop (it is in print). The book looks at the larger story of ARPA’s “Information Processing Techniques Office” (IPTO) via its founder JCR Licklider (universally known as “Lick”, a wise experimental psychologist who got interested in the potential of interactive computing as a potential “symbiotic intellectual partner”. This book is a must because Parc was in most respects an offshoot and integral part of the IPTO research community — its researchers were mostly recruited from IPTO and they brought its way of doing things. It was perhaps a bit “more like IPTO than IPTO” because Bob Taylor (who had been a director of IPTO in the 60s) pushed specifically for some of the best processes he had seen.

These included “arguing a lot, but only to make progress, not to win”, hiring processes and day to day interactions that minimized rivalries and maximized possibilities for cooperation, etc. My only real complaint about “The Dream Machine” and most of what has been written about Parc is the omission of the amount of cooperation in the whole ARPA community, and why?.

Some of this “zeitgeist” is traceable to the many activities around MIT during WWII, and especially the massive radar development effort in Building 20 (the “RadLab”), that was catalyzed by “various gifts” by the British and by look ahead visionaries.

Jerome Wiesner was a student while in the “RadLab”, later got a PhD at UMich, was head of the research lab of electronics at MIT, President Kennedy’s science advisor, then President of MIT, formed the Media Lab with Nicholas Negroponte, etc. etc. He was a “smart solid citizen” type (as Dave Evans would say). He had numerous experiences in the RadLab, and was very interested in why it worked, given the normally highly opinionated egos and top talents during that time (many Nobel Prizes were later awarded to RadLab members).

Jerry said that the original group decided at the beginning that the only thing that counted was “progress in the development of all the different kinds of radar system needed”, and that “egos had to be checked at the door”. This apparently partly led to the notion of “arguing for progress rather than to win”, and this was enforced as part of the social mores of the lab. Another part of the social glue of the RadLab was the fun of making gadgets that all shared, and this developed into “the fun of making gadgets together”, where the complementary nature of the top talents was recognized and synergized. This led to a special kind of “cooperative competition”. (This apparently also happened in its own way at Los Alamos.)

https://en.wikipedia.org/wiki/Jerome_Wiesner

Another aspect of this to consider is that there was very little competition for funding — WWII — and especially Roosevelt’s science advisor former MIT professor of EE: Vannevar Bush — made sure there was a lot of funding available for many important projects. Years later this was not lost on ARPA and Lick (who realized that funding whole departments in universities made a lot of sense, and would soften the usual university political competitions for space and students, etc.).

In any case, by the time Parc was set up — and by one of the former ARPA IPTO directors — there were plenty of young IPTO researchers in the various ARPA projects who had been brought up in the “ARPA Dream” of “interactive computer intellectual amplifiers for all, pervasively networked worldwide”, and were used to the “ARPA way” of doing things, and especially how the young researchers were able to cooperate (and that ARPA IPTO funded their development and cooperations).

So the big vision was already in place, and so were the researchers, and so were most of the processes. The catalyst was Bob Taylor — who knew, and had funded, most of the researchers in the community — and he was a critical factor in convincing us that it would be OK to “do research in a company like Xerox”. Parc would not have happened without Taylor

There was also a lucky happening in the earliest days of Parc. The group around Butler Lampson — one of the most brilliant of the young researchers in the 60s — had been at Berkeley, and then set up a company to make an advanced time-sharing system (the BCC 500). A business downturn made it impossible for them to continue at just the time Taylor was setting up Parc. Bob knew them and was able to convince the whole company to go back to being researchers and all join Parc. So, in one swoop — Jan 1971 — Parc got a critical mass of about 12 super-top former ARPA IPTO people “who could do anything”.

I was consulting for Taylor at the time, and was about to go to CMU to work on “personal computing”, but this got me to call Pittsburgh to tell them that “things had suddenly gotten much too interesting in Palo Alto”. The combination of Butler Lampson and Bob Taylor was a tremendous draw for further recruiting.

https://en.wikipedia.org/wiki/Butler_Lampson

I have written a Quora answer about Butler (somewhere). One of many big deals here was that he was super-brilliant, highly opinionated, could argue rings around anyone, very devoted to his ideas and projects and was also very helpful to projects that were not particularly in line with his opinions (like the ones I was running). This was crucial to Parc’s success, in that it could sustain multiple projects with similar goals but different methods and cooperate whilst doing so. (This was not perfect, but was generally done very well as far as I was concerned.)

I give most of the ultimate credit to the combination of Bob and Butler and how they complemented each other’s roles.

——— Added 22 April 2024 ———

I forgot to add a different longer range slant by some of the same pioneer players. For example, Lick and Bob Taylor wrote an influential paper in 1968 “The Computer As A Communication Device” that is well worth reading. Here is one quote from it:

“Take any problem worthy of the name, and you find only a few people who can contribute effectively to its solution. Those people must be brought into close intellectual partnership so that their ideas can come into contact with one another. But bring these people together physically in one place to form a team, and you have trouble, for the most creative people are often not the best team players, and there are not enough top positions in a single organization to keep them all happy. Let them go their separate ways, and each creates his own empire, large or small, and devotes more time to the role of emperor than to the role of problem solver. The principals still get together at meetings. They still visit one another. But the time scale of their communication stretches out, and the correlations among mental models degenerate between meetings so that it may take a year to do a week's communicating. There has to be some way of facilitating communication among people without bringing them together in one place."

They had both funded Engelbart, and this is a restatement of Doug’s central idea and purpose — and in fact was central to what Parc’s vision was all about. So — looking at this from Taylor’s point of view — given that the technology to do the above was not yet invented, a main problem with Parc was going to be (a) the absolute need for “the few people who can contribute effectively” to be brought together, and (b) to stave off the likely sources of “trouble” from those “who are not likely to be team players”.

How does Alan Kay view the connection between the relational model and "real OOP"? How would you avoid “impadence mismatch” problem with RDBMS? How was with Smalltalk Systems?

This deserves a longer answer than this.

One interesting history is that of “unit records”, eventually being kept as a deck of “Hollerith” and then “IBM” cards. The deck would most usually use one card for each record, and would use the same fields — a sequence of characters — on the cards for the same kind of data. The punched card accounting machines used “plugboards” to program operations on the deck (such as sorting, tabulating, etc.). A deck was thought of as a kind of “table” with fields running left to right and records from top to bottom.

Typical fields would be “Employee Number”, “Name”, “Salary”, “Date of Birth”, etc.

Fields one would not find would include e.g. “Age”. So answering a question like “Age of Employee X” was a vastly different question than e.g. “Name of Employee X”.

Eventually, the cards were put to tape — initially one for one with the cards — and eventually with much longer records and more fields.

A very high percentage of data on IBM cards, was processed by IBM machines and then computers.

IBM had an opportunity in the early 60s to move from this limited model of the world to one that included “processes as things to be retrieved”. For example, there was no actual barrier that prevented IBM from including runnable code within a record itself, or part of a “data file” on a tape.

This would allow dealing with more complex kinds of things one might want to save, find, ask questions about, etc. For example, at this point “Age” could readily be a field that directed a computer to fetch code to run that would look at a Date Of Birth and today’s date, etc. Such structures existed in the early 60s, especially in several labs at MIT(where IBM also had a large research lab).

It is worthwhile to try to understand both the corporate mind and the mind of the markets it served to understand why IBM and other companies persisted in recapitulating the old card image system, even though it actually wasn’t a very good way to model the complex world we live in.

Similarly, one can see this tendency to “automate the obsolete (but familiar)” in the use of terminal emulators, which themselves were emulations of punched cards and key punch eras.

A really interesting side part of the story is that “transactions” did need to be introduced, so relational data bases were now living in “pseudo-time” (a very good idea).

Part of the early motivations for objects came from the desire to deal with modeling and querying in the simplest uniform way (the “Age” query is a simple example that breaks the oversimplification of rudimentary data bases).

—— Added 5 May 2024——

The other two answers made me realize that I didn’t address the Smalltalk of the question, and the other two answerers thought I did.

What I would do would be to design a “knowledge” system with a semantics that includes as many relationships including temporal ones as possible. This would try to work both from models and from the kinds of questions and operations that might be desired (so: from the end-user’s point of view).

This would be very much in keeping with some of the classic work in “old time AI and Expert Systems”. An interesting example of a large version of this was Doug Lenat’s CYC system.

About Smalltalk: at Parc we created a new kind of Smalltalk and implemented from scratch every two years (72, 74, 76, 78, 80).

Something weird that happened when the first version was released to the public — Smalltalk-80 — is that — despite this version having all the tools to make very different kinds of languages (and Smalltalks) — everyone “on the outside” just took ST-80 as a “language” rather than as a “metalanguage for making the kinds of systems you want and need”. This completely missed what Smalltalk and Smalltalk programming was really all about.

So: start with the knowledge system design you want and need, then use useful metalanguages to implement a test version, then use usefulmetalanguages to make a solid releasable version.

What does Alan Kay think about Ray Kurzweil’s concepts of Singularity and The Law of Accelerating Returns?

These ideas have been around for some time, and are due to many people (see the wikipedia article for a short summary).

https://en.wikipedia.org/wiki/Accelerating_change

A graph I sometimes use in talks to help put more context on these claims looks like this:

image

The key idea here is that — at some point in the past — we started improving “power” via invented techniques and methods, and much of this is in accord with compound growth laws.

However, the “wisdom” (green) needed to use the power has not followed the same exponentials (and one could argue that the slight linear increase I show here is more of a wish than an actuality).

The red is a time and state when whatever wisdom is around cannot prevent a disastrous use of the powers.

Vi Hart has a very good line for what we need: “We must ensure that human wisdom exceeds human power”.

Humanity has done a terrible job of even recognizing this problem, much less trying to deal with it.

What is the most shocked you have been when waking up?

My wife and I were renting a flat in London — where she was working on a book. I woke up in the middle of the night — for “old men’s reasons” — and decided to see how much Hilary Clinton had won by.

I don’t think I’ve ever been more surprised upon waking than I was to find that Trump was going to win the election.

I realized that I was vastly out of touch — out of undertanding — with tens of millions of my fellow American citizens. It wasn’t the “unusualness” of Trump, but the idea that millions who had gone to school in the US had not learned enough of the real issues — or how to think about them — to see the monumental mistake they were making.

This also helped me put in context the also recent “Brexit” results here in the UK — and I think with similar failures of public education here (this is just a surmise by a fairly ignorant visitor).

An American politician — Tip O’Neal — once said: “All politics is local”. The extent to which he was right was — and is — shocking.

—— Added 24 April 2024 ——

I should have added that what jumped into my mind after the shock was a comment made by the commentator H.L. Mencken in 1920:

“As democracy is perfected, the office of president represents, more and more closely, the inner soul of the people. On some great and glorious day the plain folks of the land will reach their heart's desire at last and the White House will be adorned by a downright moron.” ― H.L. Mencken, On Politics: A Carnival of Buncombe

What is Alan Kay's advice for smart people who don't yet know what they're interested in working on? How deep should people explore topics like maths and science before deciding it isn't for them?

The advice given by “The Valark” below is pretty good. Part of the idea is that for the most part each person is interesting and idiosyncratic enough to need “custom advice”, which is often best obtained through many experiences of their own. Einstein said “Love is a better teacher than Duty”, and this means in part that what a guy named Alan Kay might have to say is likely to be irrelevant.

Added April 15th 2024 — it occured to me that I should have included what I personally wound up doing. I have put this at the end to not mess up the original sequence

But we should look at the second — more general — sentence of this question. To me, it is in the category of “How far should I try reading and writing before deciding it isn’t for me?” My reaction to this is “These are too important to decide to ignore for any reason: please press on to gain enough fluency to be part of the literate world”.

I feel the same way about mathematics and science, and will explain why in a minute. First please read this answer where I mostly quoted Kurt Vonnegut about art:

https://www.quora.com/qemail/track_click?aoid=MEOH3bjMmtq&aoty=16&aty=4&cp=0&ct=1712701034538708&et=103&force_notif_url&id=477646c9adb440d3bb1833e83e31cca4&notif_url=%2FDo-you-agree-with-Alan-Kays-comment-that-a-computer-science-degree-teaches-students-how-to-write-bad-code%2Fanswer%2FAlan-Kay-11&request_id=7&src=1&st=1712701034540906&stories=68412595569&uid=aJ9ciL9ZPHa&v=0

To the list of arts he asked the children to pick one, and do something just for the hell of it. He could have added maths and science — but he would have to give a little hint because most children or adults are not as aware of these as they are about drawing a picture or writing a little poem.

This is because the way “maths” and “science” are presented in school could hardly be farther from their actual nature. For example, writing a little poem or drawing a little picture are rarely (I hope!) presented to children as a means to some end, but learning math and/or science is all about ends rather than how the experience works as fun and internal mental vitamins.

A nice Marvin Minsky quote is: “You don’t understand something if you only understand it one way”. This raises the ante on how we use the idea of understanding, and it is another way to approach the paths that Vonnegut urges for the children — a lot of his urging is to explore what it means to make something yourself (and in a principled way). Picasso had a nice line for this: “Learn the rules like a pro so you can break them like an artist”.

We often have to be reminded about the largest and most important difference between maths and science: that our ideas and beliefs and reasoning about them (maths) happen strictly between our ears, and thus can be consistent but not at all in accord with our external situation or what we should be doing — whereas science is done with the knowledge that it is being thought about by our limited brains and so we need to put in a lot more work to draw any kind of conclusions that might have something to do with “What’s out there?”.

Einstein, in a talk in 1921, felt he had to remind his audience of physicists about this: he felt they were taking their mathematical conclusions far too literally about “What’s out there?”. He said to them:

“As far as the laws of mathematics refer to reality, they are not certain; and as far as they are certain, they do not refer to reality.”

If we are able to see that what he means by the “laws of mathematics” refers to all human reasoning, then we can see why getting fluent in maths and science is critical for general daily like (as with reading and writing). He is talking about the need for a special new kind of sanity for humanity: one that both allows internal reasoning and then requires it to be “negotiated” with the actual universe we live in (whether within our families, towns, countries, etc).

Just before he told them the above, he pointed out why we need mathematics (better more careful thinking and relating):

“it is mathematics that offers the exact natural sciences a certain measure of security which, without mathematics, they could not attain.”

Again, please realize this says “It is clear thinking and reasoning that offers our dealings with the world a security which, without clear thinking and reasoning, they could not attain”.

Then couple this with “As far as our clear thinking and reasoning refer to reality, they are not certain, and as far as they are certain, they do not refer to reality”, and you will have a major idea in your grasp, that all humanity needs to understand and practice.

Maths and science provide many more ways to view our situations, and the science part also protects our powerful conclusions from becoming like religious beliefs — they must remain tentative with respect to “reality”.

“Mapping” is a part of “maths and science”, and also a good metaphor for the larger scope of their pursuits. In human history we have had maps constructed from beliefs — e.g showing a location for the Garden Of Eden, and many maps that attempt to depict what travelors will find. Both are inaccurate. The best of the latter types can be very helpful. None of them depict actual “Truth” — in they end they merely represent beliefs. The depiction of a map is its “math part” and its relationship to what is found when visited is its “science part”. I think we can see why getting this clear and straight is critical to human life and society.

In other words, “science and math” are not just about dealing with the physical world. Enlarged to their real definitions and methods, they are really about dealing with the defective maps we humans create to try to understand everything around us. Getting better at this is not just improving how we operate, but it is critical if we want to have a workable strongly participative society.

The great educator of children — Maria Montesorri — pointed out that the main business of early childhood education is to have the children experience in their environment the strongest ideas of our civilization — to have their outlook become the best we humans can do. To do this we need to heed the ideas of all the above thinkers — and more — to help others to gain a much richer notion of where and who they are.

Added April 15th 2024 — here’s what I wound up doing with regard to the first question. This is not advice, just history. I had no plan or future goals, but was intensely curious and had learned to read fluently several years before school.

School was not a good fit, but I kept going back all the way through college, because there was more chance of getting a good steer than by just being an autodidact, and this even after serving in the US Air Force. I wound up doing way too many things, but the critical thresholds in school helped these be non-trivial.

So: pure math, molecular biology, anthopology, English (especially theatre), jazz, pit playing, classical organ, drawing and painting, and paying for all with the trade I learned in the Air Force: programming. This resulted in a burnout, but also graduation from U of Colo.

The burnout made me not want to go to grad school in either math or biology, and I didn’t want “a job” I bummed around for a few months — with what turned out to be anxiety attacks (I don’t recommend these) — until I had the thought: “I can program but I don’t know anything about computing: I could spend a year somewhere getting a masters and cool off”.

I liked the altitude and climate in Boulder, so went to the library to look for places that (a) had a Masters in computing, and (b) were above 4000 feet in altitude. There was just one in 1966: the U of Utah, and they had just started an “ARPA research project”. I had never heard of either. This turned out to be truly great luck, because it was just right for my peculiarities. I later found out that I was accepted because Dave Evans didn’t look at transcripts, just resumes — he said that transcripts are almost never good predictors of success as research grad students.

A fun footnote is that over the next few decades I wound up using every single thing I had learned when I didn’t know what I was doing, but was just being super curious. This could possibly be the next best thing to specific advice when you don’t know what to do: keep learning what you are interested in — you will have so many possiblities when you stumble into a good situation.

Did pipe organs develop at all during the classical period?

“Classical period” in history (e.g. the ancient Greeks) or “classical period” in music (e.g. Haydn, Mozart, etc.)?

The answer of Alan Piersonn to “Did the Greeks create the first pipe organ?” gives the bare facts for the ancient Greeks. If the question meant that, then “Yes” is the answer. Technically, during the Alexandrian Greeks period (which happened many decades after the “Periclean Greek Golden Age” — but was not at all inferior in intellectual powers and knowledge).

For the second meaning of the question — the time period according to Wikipedia of about 1750 (J.S. Bach’s death) to 1820 (almost the death of Beethoven & Schubert) — this is a bit ticklish to answer.

My “more than a hobby” for most of my life has been as an organist, who has also helped make and design a few. I used to own a really nice 17th century style pipe organ for many years (it has since been donated to a US university).

My personal short answer to the literal question would be “Hardly at all to no real change or progress during that time”.

This might be too harsh, but — as far as design and conception — there was virtually no change, and one could argue that one of the main “golden ages” of organ conception and design had peaked around the end of the 17th century with what is sometimes called the “North German” or “Schnitger style” instruments. This style got started partly because of the Hanseatic League trading cities, and their wealth and somewhat competition fueled the development of large instruments with full pedals and a wide variety of sounds.

However, a fun part of organ history is that — in France, starting arguably 20 or so years after the classical period, and probably fueled by the orchestral sounds of “Romantic Music” — there appeared a remarkable young organ builder by the name of Arisitide Cavaille-Coll who had his own — and very differnent — ideas about organ sound and how to build them.

image

image

Cavaille-Coll built his first instrument — St Denis in 1843 — at a young age. It was quite large (he must have been an impressive young man to have been funded) and it was built behind the older case shown here. From the start he had a particular kind of sound in mind — that in part involved the kind of blending of voices and effects (such as crescendos and dimenuendos) that orchestras were starting to do, and which included several inventions of new pipe sounds (such as his special conception of “harmonic flutes”, etc.) that solved some old problems and allowed new ideas about tone.

These organs were also possible because the 19th century revolutions in mechanical technique also included an invention by an Englishman — Barker — which was essentially a “pneumatic amplifier”, sensitive and powerful enough to be inserted between the keys the organist played and the valves in the chests that admitted air to the pipes. This permitted very large and much louder instruments to be built, and yet to be played musically and delicately.

This was the start of a revolution in organ building in the 19th century. Cavalle-Coll’s company made most of the large organs in France (and many of the small ones), about 600 during his lifetime.

An important point to end this too short answer is that while C-C revolutionized organ sound and building, his instruments did not obsolete the earlier 17th century instruments (though they did for awhile in the minds of the avant-garde). Today we organists think of these historical instrument types as two parallel peaks — two golden ages — in the long long history of the instrument (which indeed does go back to the ancient Greeks B.C.). Today there are many wonderful examples of original instruments from both the Golden Ages that are regularly played.

If Alan Kay could travel in time, back to the inception of computing, but could only take a notebook as a knowledge base from today, which insights would he pick/ignore, how would he approach recreating computing, and which year would he choose?

I don’t know how to answer this question directly.

But let’s consider Doug Engelbart, who in effect, did something very much like this.

In 1962 he wrote an extensive research proposal (for funding from the Air Force) with the title:

image

Using the phrasing of the question, here is his “notebook as a knowledge base from today” brought back from the future he was able to generate in his mind. In this “notebook” he included most of the important things that were needed and would be needed.

Virtually none of them were actually doable in 1962, and the spectactular famous “Mother of all demos” he and his group created by 1968 and showed 2500 people in San Francisco, was just a bare subset of the notebook.

A much smaller subset (along with distortions) got implemented over many years by industry, and the center of most of Engelbart’s important ideas is lacking in today’s technologies.

My points here are (a) bringing “notebooks back from the future” are what the visionaries have always done, (b) almost no one will listen to anything that is qualitatively different from the norms of the day.

What has been learned over and over is that humans are generally resistent to ideas and tools that do not seem to benefit them directly and quickly. And — unless there is a war or equivalent enormous obvious threat — society will not fund anything that isn’t simply derivative from the “normal present” — no matter what the ROI has been in the past.

How can I get in contact with Alan Kay for an interview for my high school project? I want to interview him about the history behind the development of the GUI and his work at PARC

Please tell me more about your high school project …

In any case, a good place to start is to read enough of what is available to gather enough context to ask good questions and to see how the answers fit with history.

Here’s a relatively short tribute I wrote about the larger research community I was involved with. It has a fairly extensive bibliography. The Power of the Context

Some of the work I and my group did at Parc is in a history requested by the ACM “The Early History Of Smalltalk”

A longer white paper I wrote for the MacArthur Foundation in the UK is about “How?” really big efforts have been set up and succeeded.

A book about the larger context that is pretty good is “The Dream Machine” by Mitchel Waldrop.

———— I’m still hoping for a comment from the student who asked the question …

If Alan Kay were starting a computing research group today, how would he think about where to locate it?

Interesting query. I have only had extensive experience with one style of “research grouping” — the ARPA/Parc style was very successful — but it is quite possible that there are other styles that would also work very well.

The central idea in most groups, and especially research groups, is to have the group be far more capable than the most able single members. One way to do this is to find ways to have diverse top people be able to synergize their abilities. This is not easy, but has been done quite a few times.

Another issue is scale. My experience is that the size of a good research group (up to 10 or even 20) is not enough to create the larger “research environment” needed (which needs to have several top research groups). Parc was (possibly) the smallest “research environment” that provided what research groups needed for themselves, and what produced the larger richness that the whole environment needed.

IMO the above (still today) needs to have everyone co-located.

Another level of scale is “research community” — for example, the ARPA-IPTO research community — which can span the larger field of interest (still within the larger visions). In the 60s this was about 15–17 large “ARPA Projects”, mostly at universities and some at defense related companies. Grad students and interns were the main messaging actors, with a variety of larger meetings each year of healthy subsets of the community (for example, each year there was a week long “ARPA grad students” meeting of the “top two” students from each project).

Everyone could see that some of the advanced technologies the community was inventing would be of great use to the community: personal computing, pervasive networking, next versions of the Engelbart system, etc. An interesting side note here is that the Engelbart system had a version of collaboration that was more advance than exists today, and that worked very well for distributed meetings. However, their preferred method was to have all in a meeting be in the same room and to use their advanced system for collaboration.

The co-location needs stemmed from the ARPA-Parc practise of “bull-shitting around”. A lot of time was spent in ways that would horrify most managers today: lots of beer, lots of hiking, biking, tennis, skiing, etc. An important part of this is that the researchers were basically fanatics about their projects, so lots of hours were spent on these, but just when each day was at the discretion of the researchers. The “bull-shitting around” also included a lot of “semi-serious” speculations about “life, the universe, and everything else”. This often included spur of the moment insights about the actual research work.

I don’t know how to explain the previous paragraph any better. This doesn’t work nearly as well at the smaller “research group” size as it does at the “research environment” size.

One bottom line: although many good things can be done with VR/AR, the latter is not nearly enough. More recent experience shows that Bret Victor’s “Dynamicland” approach — which was not technically possible in the 70s at Parc — would have really boosted the work.

Now we get to the crux of the question. A huge problem today is that the inflation of real estate prices far beyond regular inflation has changed the dynamic and constraints for all researchers, and especially young researchers. For example, I was a year out of my PhD when I joined Parc and my little house in Palo Alto within biking distance was about twice my modest yearly salary. Since then regular inflation has perhaps been a factor of 5–6, whereas the house I had back then has inflated by at least a factor of 50 (meaning that real estate has added an additional factor of 10).

If we add in the intangibles of “being near a major university”, etc., then we have severe problems with desirable locations and funding that didn’t exist back in the 60s and 70s.

Most universities (and companies) were slow to see that their future staffing partly depended on them being in the real estate business (Stanford was an exception).

I will add more to this after seeing a few comments.

If earning a lot of money is my target, what major should I choose? My interest was in computer science, but people are saying to me that don't do it because AI is going to replace it. What should I do?

I don’t think I can give you a satisfactory answer for a number of reasons. The first and largest one is that I’ve never had making money as a target.

I have always thought that “making money” is much too small a goal for human beings. At best, it could be “something to be noticed occasionally whilst doing something you love”. Along with that, “quality of effort” is almost always a better way to judge yourself than “quality of results”. A high former often brings a high latter, whereas just a desire for a high latter can be nerveracking and wreck what you are trying to do.

That great philosopher — Linda Ronstadt — once said: “All you can do is follow your instincts, and refine your talents to support your instincts”.

The second reason is that it is hard to deal with the topic of “AI” in the large these days without many more definitions and side conditions — and especially with (current) “AI” and programming, let alone computer science.

“AI” used to mean what is called today “Artifical General Intelligence”. The original term was co-opted — as sometimes happens when people want to claim progress by relabeling what they are doing as being an original harder art form. (This happened for “object-oriented” as well.)

A larger problem is to try to replace something that affects human life and society with something easier to use but is much more dangerous. Examples include systems that work some of the time but have deadly flaws that require real understanding of the embedded context to understand. The working some of the time allows demos and sales, but the flaws can and do literally kill. Examples include devices such as the Boeing 737 Max Autopilot which could sometimes fly a plane, but didn’t know it was flying a plane — and with passengers — so would sometimes do un-aeronautical things — like trying to fix a stall by diving, but at too low an altitude, etc.

Something to ponder is that some of the prevailing ways of thinking in business are trying to use “ANI”s for convenience and to save overheads (the “N” stands for “Non”). So they are already destablizing what have already become many not-stable-enough systems. People are starting to use “ANI”s to generate “plausible bullshit” in the manner satirized in Jonathan Swift’s Gulliver’s Travels where the philosophers on their floating island have a machine that generates random sentences which are searched for “ones that might mean something”.

This is hitting the nail on the head from just about 300 years ago (1726). How could Swift do this so well so long ago? Because he knew just how ignorant, stupid, greedy, and gullible most humans are and always have been. This means that satire in almost every age will ring true to later ages.

Philosophically, “Real AI” should be possible. It is difficult to make any power safe enough to be used, and — so far — it has been impossible to educate humans well enough to use any of their powers wisely enough.

Vi Hart has a great line for this: “We must ensure human wisdom exceeds human power”.

This seems like an excellent answer when someone asks “what should I work on?”

How does Alan Kay view the use of artificial intelligence to guide diagnosis and treatment in clinical medicine? I've been invited to write a "Con" review re: AI in hypertension (someone else to provide "Pro") & I wish to learn Alan Kay's thoughts.

Actual “artificial intelligence” could be of great use. This term today has been watered down to mean something quite different than when it was coined more than 60 years ago.

The central issue here is how to vet the processes from phenomena to causal conclusions.

For example, I have a bad kind of cancer (musosal melanoma stage 4) which is being held at bay by immunotherapy after an operation. I have tri-monthly full body PET scans and head MRIs that watch for indirect activity that might indicate that tumors are trying to form. I have heard that the minimal resolution is about a square millimeter (which might be about 100,000 cells) and this little white spot could be missed, even by expert humans.

The current “ML” techniques — when they work — are essentially “curve fitters” and transformations from thousands of parameters into a few categories. This should be very useful in medical scans, etc. But “correlation doesn’t mean causality”.

The Large Language Models are mostly syntactic and there is little in the techniques to actually create trustworthy models of the world — especially at the level needed in medicine.

Because we humans have many cognitive biases — a dangerous one in medicine is “confirmation bias” — it seems to me that using an LLM to produce what looks to be a diagnosis will be a very bad idea with the current state of the art (and this includes cases where the diagnosis is accurate (it’s not accurate for the reasons medicine needs)).

A better approach would be to make hybrids of MLs for “noticing” (Kahneman’s “System 1”) and actual cognitive models of medical knowledge to assess what is being noticed (Kahneman’s System 2 for “Slow real thinking”). This would allow important things to be noticed and then critiqued (without “hallucinations”) for a human doctor to then peruse.

I also have hypertension and only know enough to realize how complex the actual physical systems and measurements on them are for any useful model to be made on a patient. For example, some interaction with my cancer drugs and my long term use of hypertension meds cause precipitous drops in the BP, and I not only would faint every once in a while, but my left leg would “go away” for a while. While this was going on, there was no guess that the problem was my BP dropping until I fainted while in a hospital and they found how low it was.

Doctors will appreciate that we still don’t know why it happened, and what finally worked was to just cut my former BP medicine in half. Many such mysteries still exist in the complex feedback mechanisms in e.g. the endocrine and immune systems. (We need much better models!)

Right now, the responsible response of medicine to what is called “AI” today, should always strive for safety, reliability, stability, scalability, explainability, etc. before using it as a shortcut (which will often be dangerous and against the spirit of the Hippocratic Oath).

---

To everyone worrying about my cancer — please don’t. I’m in my middle 80s, and was expecting that this could be an “eventful decade”. Before 2013 it most likely would have killed me in months, but the introduction of “immunotherapy” has made a difference for a percentage of those with this type of cancer. There can be side-effects from the operation and the boosting of one’s immune system (and I’ve gotten a few of them). I think many diseases are tragedies when children get them, but are part and parcel of old age, and its inevitable end-game. I am very thankful that I did not get either dementia or arthritis — and so can still play music — compared to these, the cancer is much to be preferred).

Was there ever a serious attempt to a "multiple inheritance" Smalltalk cc: Alan Kay, Dan Ingalls? If there was, why didn't it work out?

Bob Taylor — who set up Parc computing research — was one of the directors of ARPA-IPTO (and, when there, among many of his good deeds were to heavily support Doug Engelbart, Dave Evans, and to get the ARPAnet started). As I’ve written elsewhere, IPTO had “visions, not goals” and “funded people, not projects”. So if anyone asked Bob what they should do, Bob would just say “Follow your instincts”.

My instincts were about most aspects of personal computing and systems, and they were especially magnified after meeting Seymour Papert in 1968 and seeing what he was doing with children and why LOGO had been developed to help. This led to the idea of a future “Dynabook” for “children of all ages” that would be “personal computing” in the form of a tablet and stylus computer of the future.

Influenced by Sketchpad and Simula I in 1966, I got very interested in the notion that “every SW idea could be represented as a network of intercommunicating virtual computers”, and — after meeting Papert — that “computing is more like a medium than a tool” — and if so — like reading and writing — ways must be found to not only include children, but to start early with children (as R & W does).

This led to many early goals. For example, programming should not just be for making things, but to make beautiful things, and learning to program in a good children’s language would mean leaning many important ideas about larger things: such as processes, meaning, systems, thinking, math and science, meta things, etc.

For example, a neat goal would be to invent a language in which “simple things were simple, and complex things were possible”. This was already the case with LOGO (which combined semantics from LISP, and some of its appearence from JOSS). I could see that the “everything is a virtual computer” idea could be “LOGOfied” and this could lead to a language that could (a) work for children, (b) all of it down to the metal could be written in itself (c) the kids could explore every level safely (d) the structure of the system itself could be illumimating … Etc.

By far the best and most interesting examples of these larger things intertwined with computing were to be found in the ARPA research community, and especially in its “real AI” research community.

One of the earliest big ideas in computing took quite a while to appear in a full-fledged form — this was *general instantiation” (this is curious, because the desire to use things over again and to make similar things came early).

Some of the earliest uses of this idea in a strong way happened with the 1962 Sketchpad and the first Simula (ca 1964) — and we should not forget COBOL and its need to deal with record formats and instances of them. However, Algol 60 and most other languages didn’t have it, and the languages that could express it — like LISP — “didn’t know they could”. Interestingly, the advent of time-shared multiprocessing used instantiation to make multiple virtual machine processes at the operating system level.

Nygaard and Dahl, the inventors of Simula I (which appeared in the US in 1966), realized that they could get parallel virtual machines if they broke the Algol block mechanism loose from the stack discipline storage allocation scheme. This was a huge leap, and provided a way to think about programming such structures that the magical Sketchpad had shown a few years earlier. The result was a kind of hybrid, but mathematically minded observers could readily see that everything in Algol could be replaced and defined by this single idea — and this would simplify everything greatly (except that a lot of new software engineering would need to be done to make this practical).

My earliest thoughts about a children’s language along these lines happened after the Dynabook idea and were very much catalyzed by (a) how LISP was set up, (b) how LOGO had been derived from LISP, and (c) what was going on in AI with regard to modeling situations and reasoning about them. I called the first designs “SLOGO” (for “Simulation LOGO”, and because I had a strong feeling that the first versions would really be “s l o w”).

Now let’s fast forward a few years to Xerox Parc, and a chance “hallway meeting” with Dan Ingalls and Ted Kaehler, and a discussion about “the most powerful language in the world”. Because I had studied LISP and was aware of McCarthy’s half page (page 13 in the LISP 1.5 manual) definition, and had realized that you could get full objects by looking at what McCarthy did a bit “sideways”, I asserted that you could get the “powerful language” in about the same amount of definition. This led to about two weeks of work to come up with a similar definition that was even simpler and included the new ideas. A month later I came back from a trip and found that Dan had implemented the definition, and we now had a language system we could program in, and extend. This was the first working version of Smalltalk.

Working with Dan has been one of the great pleasures of my life (and we are still very close friends and colleagues). Dan has been a genius at many things, but to pick just one standout ability: he just naturally converges on making working extensible systems that others can use (these are far more than demos, and they have led to many great and useful ideas).

I loved instantiation, but did not like way Simula 67 did inheritance. For example, there was no protection of “meaning of species” when one subclassed. For another, the subclassing could isolate a trait that seemed as it if should be more of a parameter than a subclass. And so forth. Multiple inheritance could fix a few of these, but with more problems. For example, what if each inheritance chain had a method with the same name (e.g. “print”) what meaning should the system try to come up with if “print” was sent to such an object? Maybe sequential application? But there were many cases where collision of methods were much more non-linear and confusing. (This was ugly, and certainly nothing to put in front of a child.)

However, the “LISPiness” of the first Smalltalk, allowed many kinds of late binding to be done, and that allowed us to do many experiments with various forms of “inheritance”, including: delegation, “slot inheritance”, “exemplars” instead of classes, etc.

Here is another Quora answer of mine about interitance:

https://www.quora.com/What-does-Alan-Kay-think-about-inheritance-in-object-oriented-programming

One thing we realized is that “the cleverness you can come up with whilst programming, is rarely matched by the understanding of people looking at the code afterwards” — so “if something can be done, it doesn’t mean it should be done”. This especially applies to meta, even if meta is available.

It was amazing that Xerox funded the many Altos, but they balked at funding a next gen machine just a few years later. This meant that researchers like us had to now optimize research code in order to make progress. “Once you put on your optimization hat, it is hard to think research thoughts”.

This led to a crisis, and once again Dan saved us by making a kind of Smalltalk (Smalltalk-76) that pinned more things down early, but preserved much of the original flavor. This was much faster and larger, but more care had to be taken with experiments. One decision was to try “single inheritance with lots of design”. As done by Dan, this was a bit of a miracle (which he had to reconstitue every few years from the mess that the rest of us started to create).

One of the things that turned a possible disaster into a success was that the larger goals of the group were not primarily about programming language design, but about *personal computing for novices who could grow”. This meant that most of the experiments we did — successful or not — were not incorporated into whatever was the Smalltalk of the day.

My bottom line on all of this is that inheritance is a good idea if there is a carefully worked out set of guards and principles — a kind of “algebra”/”constitution” — to preserve meanings, deal with non-linearities, etc. In other words, something that could lead to a comprehensive and powerful enough and compact enough ontology that would make things more clear (as opposed to just “more powerful”).

A final note here: it’s not just making a powerful system that should be the main goal, it’s “making a reformulatable system that should be the main goal”. In other words, if things are going well, there will be a learning curve that one will want to incorporate gracefully in the system. In a new discipline with a learning curve “Power is fleeting, change is required”.

Now that machines can generate and "understand" language, what would Alan Kay argue that are a set of first principles from which to rethink the human-machine interface?

One of the ways to spot many LLMs is to note that they often don’t handle “gender anaphora” well (meaning that a sentence about a man will be followed by a sentence also about that subject but using “she” instead of “he”). This is really ugly and bad (especially considering that this was done much better before ML).

I don’t see any evidence of actual “understanding” either of language, and especially of the world and situation that we live in.

Here’s one way to think about “What is actually needed?” (WIAN). Consider actual humans trying to live together.

They are somewhat clever, can use language a little, are not terribly knowledgeable, think their beliefs are actually “reality”, etc.

I don’t think a society with just “traditional/genetic traits” is going to work very well without something like a “constitution” (whose first version might be simply an agreed on set of customs for the societal culture). These customs will include “rights and wrongs”, ways to negotiate, etc.

In other words, once you have a system, it is really beneficial to set up tests and feedback loops to deal with safety, stability, scaling, etc. Soon there is a realization that “laws” are not nearly enough — what is needed is a “constitution” which is a philosophical framework laying out the desired norms of the system, ways to deal with instabilities, and how specific laws can be derived from the constitution. I.e. the constitution is a kind of language and operating system kernel to be a basis for many things in the present and new challenges in the future.

As an old fogey, what I think I’m seeing today is cleverness without enough context in the age of the industrial revolution.

What I want in a computing system I can work with, is something that is much more trustworthy and understandable than most humans and societal systems. I certainly don’t want to work with something that is extremely dangerously below expectations and wishes even for humans.

I think that too many people are hooked on “doing something/anything” rather than taking the time and resources to find what is actually needed to do, and then to do it “well and carefully”.

How do I write a program to evaluate the arithmetic statement: 𝑥 = (𝐴 + 𝐵∗ 𝐶) / (𝐷 - 𝐸 ∗ 𝐹 + 𝐺 ∗ 𝐻) a. Using a general register computer with three address instructions. b. Using a general register computer with two address instructions?

This sounds like a homework problem. Should you really be asking such questions on Quora?

What was the first PC, Kenbak that's according to AC M and Steve Wozniac or LINK, according to Alan C. Kay?

First, we have to decide on the term “personal computer”. I have defined it as something one can easily have in their house, made in some quantity, that has at least textual I/O, and real programs can be written on it.

A really early one would be the Bendix G-15 in the 50s, done by the even earlier computer pioneer Harry Husky.

image

Thanks to Peter David Hill for reminding me of the Librascope/Royal McBee LGP-30 (1956), and to Scott Carter for pointing out that Librascope was the main factor in the creation of this machine (see comments).

image

This was also the computer that Margaret Hamilton first used to learn programming.

If we want to add a real display, paged swappable memory, file storage, and quatities of 1,000 to 2,000, then there is Wes Clark’s LINC in 1962.

image

Mary Allen Wilkes and her home LINC (she designed and programmed the OS)

I think of the LINC as the first real “personal computer”.

Here’s another “earlier than Kenbak” personal computer: the Datapoint 2200, first sold 1970. It was actually a “programmable terminal”, that was strong enough to be used stand-alone.

image

If we remove the necessity for interaction (which I consider critical to the definition), then we can look at just “computers you could have at home”. One of the very first in this limited capacity would be Ed Berkeley’s “SIMON” from the early 50s (the magazine below is dated Oct 1950):

image

One of the first to write real programs for this was Ivan Sutherland when he was about 11 or 12.

Note that this exhibit is (a) from the Computer History Museum, and (b) is earlier than the Kenbak (actually by quite a few years) — I helped build one around 1954.

image

So I don’t understand why the question above claims that the ACM thinks that the Kenbak is earlier (this feels made up to me).

In any case, both the G-15, the LGP-30, and the LINC very much predate the Kenbak and they both are much more the real thing IMO.

How can anyone take math and physics seriously enough to devote a career to them, (overwhelming evidence that the material universe obeys mathematics) without positing that abstract entities have an independent existence outside the human mind?

Many of the answers to related questions are worth reading and pondering. For example, see what James Fullwood has to say.

Here is what Einstein had to say early in a talk to physicists in a lecture at the Prussian Academy of Sciences in 1921.

First, he noted that:

“But there is another reason for the high repute of mathematics: it is mathematics that offers the exact natural sciences a certain measure of security which, without mathematics, they could not attain.”

But, then he reminds them of one of the most important epistemological ideas/principles of science:

“As far as the laws of mathematics refer to reality, they are not certain; and as far as they are certain, they do not refer to reality.”

There are many important ways to contemplate this. One is the analogy — even overlap — of maths to maps — and that “the map is not the territory” [1] — but a really good map can be of critical help when exploring and explaining a territory.

Another — larger — notion to ponder is that pure maths can be made “certain” because they are creations of the human mind in which every part — including representations and inference — is grounded in given definitions. There is no outside world. This is part of what goes on “between the ears” of humans.

In other words, we can make maps of imaginary places that obey all the logical rules of maps of actual places e.g. Tolkein’s “Middle Earth” map obeys map inference operations such as “if the SHIRE is north of FANGORN, and ERED LUIN is north of the SHIRE, then FANGORN is south of ERED LUIN” etc.

image

We have never found anything on our planet that resembles this map, but with the map alone, we can’t say whether the map has a referent or not. (It seems not.)

This is why the Greeks (and Thomas Aquinas, etc.) got in trouble by thinking that “perfect reasoning” was enough to explain our situation. (We can only make up “perfect stories”, with regard to consistency, etc.)

We do real science by trying to find relationships — especially causal ones — from limited and noisy phenomena. Much of mathematics started in attempts to symbolically make models that tried to mirror phenomena, and especially to be good enough to predict some phenomena, and — often — to find interesting differences between the predictions and the phenomena.

We should be able to see that we are now “in a kind of negotiation” between “what’s out there?” and the kinds of worlds we can build “between our ears”. This is very different from mathematics all by itself (because “proving something” in the math of science, means very little without seeing if the phenomenological world seems to match up). This is one way to look at the meaning of Einstein’s second quote.

I once had a nice long chat with Steven Weinberg about this and that. One part touched on his book “Dreams of a Final Theory: The Search for the Fundamental Laws of Nature”. I pointed out that one of many curiosities of this book was that in one chapter he would say that “the math is not the universe (the physics)”, and in the next chapter would act as though “the math is the universe”.

He said, “Yes, this is a huge problem for most particle/quantum physicists: we don’t have anything but the math to think with, and because our brains want to believe things, we start believing the math”. (He liked Einstein’s second — and wonderfully explanatory and cautionary — quote as a way to get recentered on what we are actually doing when we are trying to do science with the help of math.)

P.S. There is no “overwhelming evidence that the material universe obeys math”. It is actually quite the opposite. The universe is whatever it is, and we make up language and notions to try to create models that can mimic some of it. It’s quite wonderful that we can do some of it, and this is why some people devote their life (it’s a lot more than a “career”) to science.

[1] Alfred Korzybski “Science and Sanity”

What question(s) might Alan Kay like to answer that he has not yet been asked (publicly)?

Thanks for asking, but if it worked that way, I would be continuously writing essays, etc.

My odd motivations include the desire/urge to answer questions plus the fun (and necessity) of getting questions from “outside”. Quora supplies these, along with the need for brief writing (which I’m trying to improve).

How did ARPA/IPTO & Xerox PARC decide who worked on what? Were there conflicts over who got to do what?

An excellent and interesting question.

The computer research part of Parc (two labs out of four) mostly were staffed from the ARPA-IPTO community and considered themselves part of it — and, most especially, considered that they were there to carry out the “ARPA Dream”, which was initially formulated by JCR Licklider (“Lick”) as (my phrasing):

Computers are destined to become interactive intellectual amplifiers for everyone universally networked worldwide

Here are more ARPA-IPTO principles of the 60s (my phrasings):

Visions not goals — “Visions: Cosmic and Romantic”

Fund people not projects

Only fund the very best people

Fund Problem Finding — not just Problem Solving

No peer review

An important part of the research results are new and better researchers.

Really Advance Something Very Important

Design and build in the future and bring back to the present

“Living Lab” – make enough of the inventions so that many can use them.

Milestones not deadlines

It’s “baseball, not golf”

… and so forth.

The tricky part to describe here is that the researchers themselves decided on what to work on. This meant that how the researchers were chosen was the most critical process. As Patrick Scaglia of HP Labs (another great research manager) liked to say “Hiring is 110%!” (This is the “fund people, not projects” in item 3 above.)

One way to think about this is that “Artists are people who can’t not do Art”. They tried to choose these kinds of people, who wanted to do their Art in the context of the Vision, and “who looked like they could do something”.

Bob Taylor — who set up the computing research at Parc — had been the 3rd director of ARPA-IPTO and had codified Lick’s intuitive principles. He never chose a goal, but as he said “My job is to create an environment such that when my lone wolves need to cooperate, they will”. When asked his job description he said “I’m a kind of impressario”.

This meant there was no party line for project goals, and this allowed more than one kind of project to address parts of the vision. So there were multiple language projects, multiple display and UI projects, etc.

Why was there almost no rivalry?

Basically a combination of mutual respect, no competing for funding, etc. At Parc Taylor decided to eliminate rivalries from the start by requiring that all the researchers be really enthusiastic about the next hire. I.e. a single negative vote and a prospective researcher was not hired. This made the hiring process slow and laborious, but the new person was welcomed with open arms by all — and this made a tremendous difference.

The only main rule at Parc was that everyone had to attend the weekly meeting — called “Dealer” — where Chinese food was consumed, all things were discussed, talks were given, pledges were made and unmade, argumentation was encouraged (not to win, but to illuminate — more below). These meetings were on Wednesdays, started at noon, and went on until all outstanding issues were taken up, etc. Excellent minutes were taken by volunteer “scribes”. Etc.

Another fun notion came about because Xerox allocated people not by funding but by “headcount” (called “slots”). This meant that when a precious “slot” appeared, the tendency was to recruit another star. And this meant that we were always “tech support light”.

This was discussed quite a bit — at some point it was realized that we all had been “tech support” earlier in our careers and still liked many parts of it. This led to the idea that Parc researchers could spend up to about 40% of their time “being hands” for other projects, as recreation. The nice thing about this was that the change from being “creative” to being “support” was very restful. For example, I really liked to solder, so I could help with soldering needs on other projects part of each day. One could make a “public pledge to help” in a Dealer, and at some point, one would “unpledge” when one felt there was no extra time available.

ARPA-IPTO liked to “argue for illumination”, and was very argumentative. The level of this at Parc shocked visiting Xerox executives. Taylor had refined this to handle situations when the arguers had forgotten about “illumination” and started to try to win. He would call out “Type 2!” This meant that now each arguer had to explain the other person’s point until the other person agreed, and vice versa. By the time this was done, the heat of the moment had subsided, and clarity was again the goal.

By my reckoning, ARPA-IPTO and especially Parc, had many of the absolute top minds in computing. The process made it not just easy to cooperate, but also created enormous synergies. This included for example, people with different notions of what a new kind of programming language should be like actually helping each other to make their respective notions work.

Resource limitations could be a source of conflict — but Taylor kept Parc small so that he could get super-researchers within the budget. This worked extremely well (socially), even when Xerox would crimp the budget when they went through difficulties. Towards the end of the first 10 years, this gradually deteriorated.

After ARPA became DARPA ca 1970, it no longer operated as it had (and this was one of the reasons Parc came into existence). After Parc (ca 1982) there were no large enlightened funders left. The individual researchers carried on in various places, but the large “communal” critical mass never happened again. The way I think about it is that though I still had a research group of top people, I could no longer walk down the hall to get Butler Lampson to tell me why my “idea X” wouldn’t work, and then start thinking about why it really would work (Butler was great at both arguing and being honest when the argument changed.)

One fanciful way to think of the change is that the funders no longer wanted artists to create beautiful murals, now they just wanted painters to paint the walls a certain color.

This leads to Principle 0:

0. The goodness of the results correlates most strongly with the goodness of the funders

How do you think the rise of AI personal computers will impact the PC industry?

Anthropology was one of my minors in college, and the head of the department had a Macaque monkey as a pet, and would occasionally bring it to class. When we asked him about his experience at home, he said “It’s like having a dozen cats with hands”.

The current technologies seem terrible to me, and I don’t think they are generally helping except where they can be closely vetted: to e.g. help find white patches on an Xray that shouldn’t be there.

This is a little bit like an industrial revolution version of mass produced Trumps, but able to traverse the world via the Internet.

I advise reading the psychologist Gary Marcus, who has been commenting and chronicling the myriad failings and dangers.

https://garymarcus.substack.com/

To what extent did The Inner Game of Tennis shift PARC’s point of view?

It had a huge influence on my point of view — I saw the Harry Reasoner show with Tim on it and read the book ca 1975 — but I don’t think that it affected anyone else at Parc while I was there.

---

However, a few more words are in order. TIGOT indirectly affected some of the most important work at Parc, especially in the UI and Learning areas. One of Tim Gallwey’s nicest observations — “The problem with most learning is that the parts of the brain/mind that have to do the learning, don’t understand English!” — resonated very strongly — and hooked up nicely with Jerome Bruner’s ideas about “multiple ways of knowing” and “multiple minds” between our ears. This led to the “multi-modal” GUI and the slogan “Doing with Images makes Symbols”.

Another idea we really liked was Tim’s tactic of finding an activity for the learner that would simultaneously distract the non-learning part of the mind that was trying to control things while drawing more focus and attention to what the learning parts of the mind needed. I believe that we could have done a lot more with this idea than we did.

The question says “PARC’s point of view”, and I don’t think this was shifted, but it was certainly the case that Tim’s work greatly influenced both me and my research group (the Learning Research Group).

---

I should also mention that Tim Gallwey and TIGOT were huge influences on the research groups I set up after Parc, especially for Apple and later at Disney. I hired Tim as a consultant and advisor, and his ideas were important parts of the larger worldview of this research.

Is there any hope of a hardware company making something like bartons computer hardware now?

The central idea and issue here is that exponentially massive scaling changes the design issues and design rules every few years.

Bob Barton and his colleagues at Burroughs responded to the first really high level “tidy” languages — e.g. Algol-58, and then Algol-60 — with a “systems response” to their challenges, in which part of the result was a new environment and meeting place for hardware and software. This was amazing for its time (1960–62), and still enlightening and thought provoking today.

However, by the mid to late 60s, it was possible to use newer knowledge to come up with both better notions of programming languages and better notions of where languages and hardware could now meet. For example, dynamically loadable microcode into a faster than RAM buffer was possible in the late 60s, and allowed better designs all around (Parc took advantage of both what Barton did, and the newer components, to come up with stronger combinations, etc.)

And Bob himself helped invent — secretively — such architectures for Burroughs during this time (we didn’t know exactly what he was working on, but it turned out to be what he later called a “configurable” computer (the “1000 series” of designs)). Burroughs tried to do a lot of things with this approach (and it’s interesting to contrast it with the much simpler/parsimonious work of Chuck Thacker at Parc on the Alto, and later machines).

The exponential nature of Moore’s Law makes each HW “era” just a few years for each particular opportunity for getting substantial leverage from “brilliant design”.

One of the “rubs” here is that there are large learning curves required, and historically, only a fraction of researchers did the work (and wanted to do the work) required to “re-wire” their instincts and points of view. Part of the lure to this small percentage is the analogies to “truth and beauty”. The main goals of “industry” are quite different, and are aimed at “pools of customers” (and also “pools of workers”) — and how to deal with “legacies”.

There have been 60 years of Moore’s Law since the early 60s, and — conservatively — this means about a factor of 40 doublings (20 doublings is about a factor of a million), distributed over increases of computing power, capacities, and lower price. The universal nature of the microcomponents means that very conservative next level designs will still speed up and cost less. In practice, this has meant an almost freeze in actual progress in programming language designs, etc., over the last 40 years.

So the questions I would ask first would mainly have to do with “how should we be programming today?”, and “what should computer systems look like today?”.

By the way, the big deal with Bob Barton, was that he was always asking these questions, and he was always extremely dissatisfied with everything in computing, including his own brilliant work (which he could clearly see was “mainly good for its brief time”, and which needed to be superseded often). Most of us at Xerox Parc felt quite the same about what we accomplished in the 70s.

P.S.1. It’s worth noting the lag between “serious graphics” and GPUs. We can also note the appearance of special HW for doing low res vector/tensor processing for ML (and here we could “ask questions about the worth of the questions …”.

P.S.2. Bob was a mathematician and “systems guy”, so his instincts were aimed at places in desired HW-SW systems where key needs were creating bottlenecks (and even neglect), and where “math” (relationship insights) could be rendered in HW to make a big difference. I urge interested readers to really take a look at the “Barton architecture” (he gives Ben Dent credit for helping with the ideas) that was used in the B5000 — and to especially look beyond the use of a stack, postfix code, etc. to see how protection, processes, and virtual memory were done. His classic very short paper (4 pages of “math”) from 1961 is cryptic, but is worth grokking (alongside a more conventional account of the B5000).

Do you believe it is ever a good idea to use a GOTO in a C/C++ program, why?

First, check out “Zahn’s Device” (Wikipedia calls this “Zahn’s Construct”). This will take you back to this ancient controversy, and some interesting work arounds.

I started as a machine code programmer in the early 60s, and gotos (JMPs) were very much used.

It’s worth looking at the voluminous machine code listings of the Apollo Guidance Computer, and reflect that they never had an actual software bug in the history of the moon shots (the famous glitch on the moon landing was a procedure error by an astronaut, and the software actually was able to fail-safe the error in real-time).

The issues about whether to use them in higher level languages that had more structured ways to change flows of control, revolved not around the goto itself, but about whether at any given point in a program a programmer could have an accurate picture of the current state of the variables. Most gotos were written so as not to create mysteries at the other end of things — in other words, the goto was not so much of the problem as the “where from” questions in the midst of code.

The many kinds of race conditions often led to really careful programmers not assigning to the same variable more than once in a loop, etc.

The biggest problem with “taking care” and then letting time pass, is that the exact reasons why something is OK get forgotten, and changes often “don’t take the same care”.

Then there is the observation of Tony Hoare (Turing Laureate) that “Debugging is harder than programming, so don’t use up all your cleverness programming”.

When programming is looked at longitudinally, it looks (and is) often really bad. One reason for high level languages is to deal with “the programmer problem” by walling off dangerous stuff wherever possible (this will include not just control, but storage management, etc.). This should nowadays exclude most sequential “munge-variable” programming, and dangerous control constructs such as “goto”.

When was the first computer invented? Who invented the Internet?

I’ve answered the first question a few times on Quora.

By just saying “computer”, we can look far back in the past to find mechanisms invented by humans to help them reason and calculate. Two very old ones (B.C.) are the abacus — not just in China, but in Greece and Rome —

image

and the amazing Antikythera mechanism which seems to be a very sophisticated calculator for the apparent movements of the sun, moon, and planets.

image

If we were to look for “programmable computer”, and give credit for early ideas, we have to mention Babbage’s “Analytical Engine” which was intended to be programmed using the techniques of the Jacquard Loom which had sequences of punched cards to program patterns woven by looms.

If we include credit for actually making a device that worked, then we could pick John Atanasoff and Cliff Berry (1939) or Eckert and Mauchly with ENIAC (ca 1945). Personally, I don’t think either one of these machines is programmable enough.

For “semi-programmable” computers, I prefer Colossus by Max Newman (and especially) Tommy Flowers at Bletchley Park.

However, my pick is Konrad Zuse who — working by himself — made several programmable digital computers (the Z3 in 1941) and even invented “the earliest” higher level programming language — “Plankalkül” — The Computer History Museum Fellow citation sums it up: "for his invention of the first program-controlled, electromechanical, digital computer and the first high-level programming language, Plankalkül."

The Internet — as with so many of our modern computer technologies — was done in the context of a research community — funded by ARPA, with Parc funded by Xerox. The way these highly productive communities work is that ideas are freely shared, there is a lot of process, designs, testing, etc. and there are usually a few “special sparkplugs” amongst the general high talents that make a difference.

Awards often miss some who should be included, so we should look further than e.g. the NAE Draper Prizes for the Internet and for Personal Computing.

The Draper Prize for the Internet was awarded to Vint Cerf, Bob Kahn, Len Kleinrock, and Larry Roberts. To this, we could add names such as Paul Baran, Dave Davies, some of the ARPAnet pioneers, and several of the Parc networking researchers who also contributed to the Internet (e.g. Bob Metcalfe, Butler Lampson, etc.).

My favorite “special sparkplug” from this esteemed group is Vint Cerf, who both (a) was the co-creator of the TCP/IP protocols, and (b) ran what was perhaps the most productive of the group efforts to nail things down into workable forms.

Isn't artificial intelligence just a fancy name for sophisticated programming running in fast enough computers?

This is a pretty good question — especially because you are trying to separate labels from actual meanings.

I think homing in on a good definition of “intelligence” will help you answer your question for yourself — especially if you can come up with a good characterization of a goal for “artificial intelligence”.

I was once in a meeting in the 60s that included heavy duty military officers and a few AI researchers (I was neither: just “a systems guy”). One of the points one of the generals made was that they really had very little use for a computer simulation of normal human intelligence (because they could draft and train soldiers for these needs and tasks). What they really wanted — he said — was “superhuman intelligence”.

Another point of view is that we might be able to use the term “intelligence” when a machine can be made to do something that a human would need intelligence for. A very simple example used long ago by John McCarthy (who coined the term AI) was that of a house thermostat. It could sense the temperature and take actions to turn on or off the heat to try to keep the ambient temperature within a certain range.

I don’t think that is enough. If a machine can beat a human at chess, is it intelligent? I would argue that this is like comparing a race by a human and an automobile (it doesn’t make sense because it is a category error).

The Boeing 737 Max has been in the news frequently over the last few years — most famously for having an autopilot that would e.g. trying to correct for a stall condition by diving the plane to get more air flowing over the wings (pilots know this is a good general idea — but pilots also know (a) this is a terrible idea if the plane is only a few hundred feet off the ground, and (b) there are human beings on board who will be killed thereby). In this case the autopilot is quite stupid enough to be cheated at cards by the thermostat! Boeing allowed this to happen twice (meaning their executives are all too close in IQ and ethics to the autopilot).

Years ago I suggested that the term AI be replaced by “flexible competence” — this has the advantage that each part of the term has more meaning and can be measured in more meaningful ways. Note, that could even apply this to engineering artifacts like drawbridges: they are competent in two distinct ways, and employ flexibility to accomplish this (they could be the new “thermostat” in a new metaphor).

I think the bottom line for any kind of engineering can be taken from the Hippocratic Oath in Medicine (itself a kind of engineering science): “Above all do no harm”. The bridge must not collapse, the plane must not crash, the building must not fall, computer programs must be safe, and be able to prove they are safe, etc.

I think the previous paragraph is the best answer I can come up with for your question.

What was the best computer system you worked on (or with) and why? For "best" and "computer system" pick your own definitions. 🙂

Thank you Mark!

For me, questions like this are in the same category as “What is the best book you have read?”, “What is the best food you have eaten?”, “What is the best musical instrument (or even “best guitar” or “best pipe organ”) you have played?”, or “best performance you’ve been involved in?”, and so forth.

I will try an elliptical answer first, and then see if the same logic can be applied to computing experiences.

(Leaving out a lot of backstory detail here) long ago, there was to be a production of “The Fantasticks” when I was in USAF in San Antonio, and they couldn’t find a harpist (after needing a concert pianist — which they did have — it is the next key instrument in the score). I volunteered to take a shot at it (and the wonderful pianist thought it was both funny and game, and helped). Our show ran for 56 performances (and closed only because the theatre had already booked another show).

As we entered the last week of this, there were literally tears all around — cast and musicians — from the thought that in a few days we would not be able to continue. This can be hard to explain, but it is well known in many art forms: it’s a combination of basic material being so good that the more you contact it the more it amazes and gives back — and — that each encounter is different: the audiences are different, and the performers are being changed by the experience and life in general, so each show is different (and also it is often the case that memories don’t just come up with one performance that stood out, but many — because real art has so many many aspects to give and to love).

Similarly, one’s memory of general performances doesn’t just single out one material — i.e. “The Fantasticks” is a kind of theatrical miracle, but it is far from the only theatrical miracle.

This obtains for books, musicians and composers, etc. I think it’s possible to invent a category named “wonderful”, but for me it is always populated by many examples. (So: Bach and a Schnitger type organ, Franck and a Cavaille-Coll type organ, and all the way to e.g. Mark Knopfler and his special musical vision via a special way to play a Strat — “wonderful” is a big idea!).

One final preliminary here concerns the human beings who are the central movers for art — it is hard to describe them other than “they are people who can’t not do art”. This makes this group very interesting, memorable, and (often) confounding.

Now let’s look at computing, and the questions about “best” and “computing system” and “why?”.

To me “computing system” is in the category that includes paper, pencils, pens, charcoal, watercolor, canvas, oils, and so forth. It is something that can be shaped into a manifestation of an idea. Note that the ideas in art almost always transcend the materials, and part of the artists’ processes are to find a manifestation of the idea/impulse in the materials at hand. I think most readers can see that rather often it is the limitations of the materials that really help the artist to shape.

So: too many “wonderful things” over the years to be able to mention.

To me, the big deal and reward is to be able to “do the work” by being able to follow one’s nose towards the “perfume”. There is nothing like it. “The center of ‘why?’ is Art”

For me, this was most fruitful in settings that were rather like theatre communities that had just enough funding to create what was needed and sought after. I’ve mentioned the ARPA/Parc research communities as a wonderful and long rich period. But, with the right people as colleagues, lots also got done in less felicitous environments. “Artists have to do art” etc.

I particularly enjoyed the aftermath of the “Disney Etoy Period” when what we had created for 8–10 year old children — and worked so well for them — was also comprehensive enough to do my talks to adults and show them examples that the kids could do that were greeted by both gasps and applause (for not being easy or even a reasonable goal in the “professional systems”). This system was possible via one of the many great teams I’ve worked with.

That was one of many “Fantasticks” level eras for me.

Which personal computer was the first to use an operating system not based on Unix?

As I define “Personal Computer” Wes Clarke’s LINC from the early 60s qualifies as one of the first (see “The LINC Was Early And Small” by Wes in ACM’s “History of Personal Workstations”).

It had a display, a keyboard, virtual memory, etc. — and an interactive OS — which was called: LAP6.

One of the central figures in most aspects of this remarkable tiny machine — especially the software system — was Mary Allen Wilkes, a truly amazing computerist and programmer — “smarter than anybody” — and perhaps the first to have a personal computer in her home).

Here is her abstract of this system for her 1970 CACM paper on this OS.

LAP6 is an on-line system running on a 2048-word LINC which provides full facilities for text editing, automatic filing and file maintenance, and program preparation and assembly. It focuses on the preparation and editing of continuously displayed 23,040-character text strings (manuscripts) which can be positioned anywhere by the user and edited by simply adding and deleting lines as though working directly on an elastic scroll. Other features are available through a uniform command set which itself can be augmented by the user.

The machine, although small, aids program design by providing display scope and premarked randomly addressable LINC tapes as standard items, in an environment similar to that of a sophisticated terminal. The tapes are logically similar to a disk. Priority was given to the design of efficient tape algorithms to minimize the limitations of the small memory. Techniques developed for handling scroll editing, filing, and the layered system structure are outlined.

LAP6 is used by about 2000 people in 11 countries. Its design was strongly influenced by performance criteria established in interviews held with the LINC users themselves during the specification period.

Also, the Xerox PARC Alto was a personal computer, and predated 8-bit micros and most of Unix — and had a number of OSs, including with GUIs.

Numerous 8-bit micros also had OSs long before Unix.

What are some good biographies that every aspiring programmer should read?

Sadly, as with histories of our field/profession, most are not very good.

One I liked a lot — and I knew the author over many years as well — was “Computing in the Middle Ages” by Severo Ornstein, who was one of the main people on the LINC (the first personal computer), the ARPAnet at BBN, the laser printers and the Dorado at Parc. He is still kicking at age 93, thank goodness!

The best (only really good one) history of ARPA-IPTO and Parc is “The Dream Machine” by Mitchell Waldrop. We all thought it was amazingly accurate in so many respects.

The ACM’s Histories (Workstations, Programming Languages, etc. are pretty good).

“A Mind At Play” is about Claude Shannon.

Good histories of machines include:

“The Whirlwind Project” — “The Apollo Guidance Computer” — SAGE Q7 (not the actual title).

There are a number of good histories about “how to pull off new big things”. For example:

There are several good histories of Radar, especially the one about Tizard, and another about Loomis (“Tuxedo Park”).

General Groves wrote a wonderful matter of fact book about the Manhattan Project “Now It Can Be Told”.

There are several great histories about building the Empire State Building.

---

Mark Nobles asked in a comment below: “When will you finish your autobiography?”

My reply was:

I haven’t started one, and am still actually living a life of doing things I love … so I have a feeling I will never write one.

When Quora asked for a one-word “tombstone” description, I replied “Curious”.

When conferences ask for a “short bio”, I usually reply: “No one owes more to his research community than I do”. This sums it up for me.

However, the ACM (via Jean Sammet) did suspend my regular activities by “strongly encouraging” me to write “The Early History Of Smalltalk” for their HOPL II conference and history series (circa early to mid-90s).

One of the requirements for the Kyoto Prize (2004) was that I had to write and give a “Commemorative Lecture” that had to include some autobiographical elements. https://tinlizzie.org/VPRIPapers/m2004002_center.pdf

I wrote a tribute to the ARPA/Parc community in 2004 when Butler Lampson, Chuck Thacker, Bob Taylor and I received the NAE Draper Prize for “Networked Personal Computing”. https://tinlizzie.org/VPRIPapers/m2004001_power.pdf

Recently — for a 50th year celebration of this and that — the University of Utah asked me to give a talk about what I had done in my 2.5 year long doctoral program — this is more of a history than a bio: “08 We Were So Lucky Alan Kay

Biographies are tricky — told by others they are generally quite inaccurate and often miss the point. Unfortunately, much the same obtains for autobiographies! Still the latter are the best if the authors have found ways to take multiple perspectives on themselves.

Discuss fully the history of computer and the operating systems. Why are Gul-based operating systems more popular than command line interface (CLI) based operating system?

This sounds like a combination of a command and a homework assignment (a would be command). Please try me again with the first sentence omitted.

What profound insights did Seymour Cray have that allowed him to design faster computers than others could at the time? Do they consist of proper cooling and signal sync, which, frankly, seem to be methods that computer scientists should have known?

I had both the privilege and experience of working with Seymour Cray in 1965 on his CDC 6600 at his Chippewa Falls lab (when he was still with Control Data). I very strongly suggest that you first read the Wikipedia article on Seymour Cray to get a small but useful gist of his life and works. Then I suggest you read the Quora answer I wrote about “Idiosyncrasies of Old Computers”. I will probably repeat some material from this.

First, it was because of Cray’s design abilities that there was a company called “Control Data” and that its products were the only competition that IBM was worried about: Cray’s 1604 was perhaps the first completely transisterized computer — and it was very good — conflicts with Sperry Rand about this machine led to the formation of CDC. The CDC 3600 was a fierce competitor for the top end IBM scientific computers (like the 7090 and then 7094), and — having spent some time programming it at the National Center For Atmospheric Research in Boulder (NCAR) — I thought it was superior.

There was also an “arms race” for super computers — due to both the cold war and, in NCAR’s case, concern about the weather systems of the world, and in the early 60s, this was joined by a concern about global warming — the first good enough research was done in the late 50s by Charles Kealing, and definitive results announced by NSF in 1963 (so most of the world wasted most of 60 years to prepare).

The early buyers for SCs were Los Alamos, Livermore Labs, Sandia, NCAR, etc. IBM had a habit of quashing competitors by announcing a new machine to oppose them (whether or not IBM was going to eventually deliver, or even work on the machine — this eventually wound them in court in a large anti-trust case, which they lost). One of the IBM announcements was about the “Stretch” (the IBM 7030) which was worked on, but IBM had a very hard time actually building this pipelined architecture.

Meanwhile, CDC promised “something wonderful” in terms of both speed and price when they announced the 6600. All the big buyers bought one before any were made and running.

Because of their past experience with CDC, the big buyers expected (a) that the eventual machine would really be as fast as claimed, and (b) that the software would be terrible.

CDC had set up a large SW team for the 6600 in California (very far from CDC’s headquarters in the midwest), and soon a number of nicely written manuals showed up about the various assemblers (the 6600 had “essentially/logically” 12 processors of two different designs). These were in the form of 10 “peripheral processors” which were essentially “In/Out Units”, made flexible via programming, and a single 100ns cycle time CPU (but if carefully programmed one could make use of its multiple fetch and store units, multiple arithmetic units, etc). It had its own version of pipelining via a smart cache (called “The Stunt Box”).

One of the catches of computers in those days is that core memory was much slower than the very fast new silicon transistors — Cray got them from Fairchild — used for CPUs.

So main memory was organized as 32 banks with the lower 5 bits of the 6600’s addresses treated as bank addresses. One could start a new fetch on any bank every 100ns, but would have to wait 10 of these cycles (1 microsecond) for the actual data to show up. Part of the many games a programmer had to play with this machine was to keep as many banks cycling as possible (failure to do this would drop the effective CPU from 10 MIPS to 1 MIP! The Wikipedia article on the 6600 is not very good but will help a little (I don’t think it was written by anyone who was a programmer of this machine).

At NCAR the two head programmers — Paul Rotar and Dave Kitts — decided that CDC would flunk the software part of the 6600 development — so they decided to write their own version for NCAR’s use — and to conform to the manuals they had gotten from CDC.

I was a part-time “journeyman programmer” there working my way through U of Colorado (full time at NCAR in the summers). Despite this, Rotar and Kitts asked me to create “a fast 6600 simulator” that would run fast enough on the 3600 we had to allow them to do the systems software needed. This was a lot of fun to do, given the complexity of the 6600 HW system (it was opposite to most machines, the instructions were much simpler (basically like RISC instructions for the CPU) but the parallelisms and timings and coordinations were very tricky).

With a little help, I got this going in a few tense months and Rotar and Kitts happily used it for their work — I wound up doing some utility programming for them — and this got me to include a debugger into the simulator (the 6600 didn’t have interrupts — not such a bad idea, every programmer hated interrupt systems — but what was there in its place was quite inadequate.

Los Alamos and Livermore got serial numbers 1 and 2 (NCAR’s was serial 0 and was the test computer at Cray’s lab in Chippewa Falls Wisconsin).

Then the boom lowered. And Rotar and Kitts turned out to be righter than their wildest nightmares: there was no software from the official CDC software teams in California — they had only written manuals, and apparently had just goofed around while the hardware was being built.

It turned out that the other labs had experienced similar misgivings, and had also written some SW — and Seymour Cray himself had written an amazing stopgap multiprocess OS — in octal absolute, there was no assembler — in order to debug the machine.

Those who had actually done something got together and decided to send themselves to Cray’s Chippewa Labs to cook up something workable. I left school for 6 months for this (I got invited because Rotar and Kitts wanted a disassembler (to be able to deal with Cray’s octal code) and that I would also need to write a “6600 simulator on the 6600” to debug code (because of the lack of an interrupt system)).

Dijkstra had a nice phrase for the general programming in the early 60s: “logical brinksmanship”. And this could be applied to most of Cray’s (and his design partner, Jim Thornton’s) approach at every level to designing and building the 6600’s hardware.

I should say something about the setting. Cray was born in Chippewa Falls (population about 10,000 in the mid-60s) and he owned 40% of CDC stock — so when he wanted to escape the bureaucracy of the home base in Minneapolis to set up a lab across the woods from his home, they could only say yes. The lab was huge in size, with multiple large rooms (“bays”), but only had about 20 people (they could do everything). Each bay was like a separate building and only had one project in it (plus all the tools that project needed). In one bay was serial 0 of the 6600, in the next bay was what would eventually be serial 0 of the 7600, about half built already, etc.

Most of Cray’s worker bees were friendly and talkative. One of them did not talk, and wandered around in a lumberjack shirt. After two weeks there, I learned this was Seymour Cray! (who didn’t say anything until he really needed to say something …). Another image I have of him is him sitting in his office, which was completely bare except for a chair, desk, and a single piece of paper on the desk that he would be looking at. No one knew what was on the paper, or whether the paper changed over time …

What kind of mind could write a multi-process OS by literally punching octal into punched cards? (The legend is that it worked first time — and it certainly worked well when we were there.) And it had to work because the 6600 had no “blinkin’ lights” — it had a bootpanel of switches to get things started, but then lots had to run because the console was just two CRTs and some keys — called a “Gazork” because it looked like a mythical sea beast.

That kind of mind could conceive of and make a computer that just barely worked (and had many idiosyncrasies — not unlike an English sports car of the day (with its great set of tools in its boot), as contrasted with a Japanese car, which fooled GM because it didn’t break and didn’t need to fixed and tweaked)). There are two very different kinds of engineering required for these two different approaches!

For example, it is easy to get race conditions with HW, especially as it is made locally faster and faster. One way to deal with these are to have multiphase clocks that essentially perform a “wait until all signals have settled” before going on to the next logical stage. However, if you are willing to “go to the brink” you can use connections and wires as delay lines (and just make sure that everything is tweaked so that when the wires are looked at, they all have good data. This was accomplished in the small (on the HW plugin modules) and in the large, on the connections between the modules (many wires would have little tags on them saying “this wire must be 48.4” long when replaced — that’s a little over 4 ns of delay!). The engineers would tune the machine literally using nail clippers!

There were theoretically parallel adders and multipliers, but physically there was only one unit — the wires as delay lines trick was used to buffer the partial results of one logical unit while the other logical unit was cycling (this trick was later used to great effect on the Parc Alto by Chuck Thacker instead of putting a slower output latch after the ALU).

My boss at NCAR — the wonderful Glenn Lewis, who had gotten his PhD from von Neumann — was mild mannered, super smart, and terrific to work for and with. The only time I saw him lose it — and it was a total meltdown — was when he found out — after the delivery of our 6600 — that it had no parity bits or checking in the main memory. When confronted, Cray said he wanted to use every core in memory for “information bits”!

Here are some extracts from Paul Rotar’s history of early NCAR computing:

** I was part of “systems staff”.

You can see that Paul Rotar did not think highly of the 6600. I (later) thought of it as more of a scientific experiment in computer hardware design rather than an actual workable machine. Though I was a “staff programmer” I really didn’t know much about computing (my majors at CU were pure math and molecular biology). In grad school, I found myself in a deep computing culture (the ARPA-IPTO research community) — which knew a lot about HW and SW — and was able to have more after the experience thoughts about the 6600.

It definitely addressed many of the major problems of designing and making a supercomputer. I.e (a) The memories are both too small and way too slow. (b) The faster you try to run, the more heat you develop and have to dissipate. (c) there are difficult timing issues at almost every level of scale (d) eventually you actually have to write and debug and run programs other than benchmarks: what is this like? And what are the tradeoffs in eventual speeds?

To address (d) first, there were no levels of program writing that were satisfactory. Extremely careful and mind-boggling care could result in fairly small, fairly fast programs. I wrote a few of these (data conversion as a streaming process was a good target for what could be done). However, the kinds of programs that were not only written in FORTRAN, but the scientists wanted to stay with HLLs — and were complete nighmares in the machine codes of the 6600 — ran really s l o w … For many years, the FORTRAN used on the 6600 ran slower overall than the FORTRAN on the 10 times slower 3600!

This was because unwinding a loop didn’t help much — you had to unwind the loop and then carve it on the diagonal (to get sections of iterations i, i+1, and i+2 as a chunk). It was years before compiler technologies evolved to even deal with such complex planning issues. The result was that what Cray wanted (the fastest by far computer in the world (which he got)) conflicted with what the scientists wanted (the fastest by far FORTRAN programs in the world (which they didn’t get at all)).

This conflict has happened many times over the history of computing — and still persists. There are still very disconnected worlds of computing, and the goals — and what optimization even means — are very often vastly different. This has generally been the case for “supercomputing” of most kinds over the decades since Stretch and the 6600.

It might help the readers of this already long answer to ponder other — better — ways to deal with issues (a), (b), and (c) in light of the “scientist’s goals” for having (and paying for) a supercomputer in the first place …

For example a lot of the desired programs were essentially “fluid dynamics” simulations — the fluids could be air, water, oils, biological organisms at the microlevels, fissioning plutonium, etc. Science understands most of these in terms of local interactions and intermodule interactions. For the atmosphere, one carves it up into “cells” (whose size depends on the computing capacity), and writes code as discrete differential relationships within each cell (there might be several thousand parameters) and also in terms of how cells relate to neighboring cells — there might be thousands to millions of cells. Similar techniques are used for each kind of fluid studied.

What architectures can you come up with that address what is actually needed (WIAN)?

Where do you think programs begin to be a "big ball of mud"? That is where do you think programs get large enough that a person cannot track all the interactions between the parts, even if the parts are coherent abstractions?

The great computer scientist — and first Turing Award winner — Alan Perlis, liked to say “One person’s constant is another person’s variable”.

For concepts, cognitive science has measured “cognitive load” going back to George Miller’s 7±2 simultaneous “chunks” (this was for sequences of alphanumerics — is more like 4±3 for larger ideas at once). Visually, we can “browse” about 100 images at once to locate a particular image (this is an important “constant” for UI designers). Beyond this, search becomes more and more important.

A typical 400 page book is about 20,000 lines (and usually organized much more than code is). 100,000 lines of code not looked at for a few years is much harder to grok than when dealt with in more recent memory.

Dan Ingalls’ first Smalltalk-76 system included its own OS, development and end-user tools, graphics system, etc. and was about 185 pages when printed out on 8.5″x11″ paper. This was about 10,000 lines of code (50 total classes) and was quite joyful to read through and work with.

Another way to look at the question is via questions of “modules at a time”. Even though we humans have real cognitive load issues, the above suggests that code put (and debugged) in a small number of simply communicating modules that have been debugged standalone can be handled reasonably well.

The catch is that the “code” here is often different in form than the codes used for the internals (this can be dealt with to a large extent by deriving the forms for the smaller modules of code from what is needed in the large — e.g. this is how the Smalltalk “language” was originally derived: the system itself is just conventions for communication and for message forms — one makes a “language” via coming up with great designs and expressive messages).

It’s sobering that what one sees on the small visual angle of a screen is usually about 50 lines of code in a window, and there is often little sense of the whole system and its possible interactions. This gets to be “mud” very quickly — and the weak module and expressive possibilities of most systems make it difficult to produce meaningful overviews.

How might Alan Kay elaborate on "You could think of it [Val Shorre's Meta II] as an active interface language on objects"?

Part of the context here is from the early thinking about objects (using Meta II as an example of a pattern matcher), and part is very relevant to today and its scalings.

Let’s suppose we’ve gotten far enough to see that “entities” that can act as whole computers communicating on a network can semantically represent anything that computers can do — and that no other concepts are needed.

We then need to think about “messages” between the objects. We should be able to see that after some levels of scaling that “messages that can command” are a bad idea — and this should lead us to conclude that this is also a bad idea on smaller scales: so messages shouldn’t be imperative.

---

Claude Wilbur in a comment below asked a good question about “imperatives” that I should have made more clear here. My reply was: What I meant was that messages should not be commands that have to be obeyed — messages should only be requests (that can be refused by the receivers), so every entity is a “protected server”.

---

We should also be able to see that scaling will start to make “names” less and less useful (for example, to have “names” that can be understood in a world of billions of possible destinations, is not a good approach).

This can lead to several useful ideas. One is to understand why human languages have “sentences” rather than just a “word” for every idea (we want to have fewer basic concepts than the ideas we want to express — so we want to be able to combine basic concepts to express new concepts, etc.

Another useful idea is that — because words tend to be very local — ultimately finding useful destinations for our messages will require flexible description matching.

Both of these ideas lead to possibly much simpler ways to get more done.

Early pattern matching systems which influenced my thinking were Schorre’s Meta II and Carl Hewitt’s PLANNER. For interested readers, I commend the study of Gelernter’s LINDA and what it could mean as the start of being able to deal with much larger scalings and meanings.

Note: a wonderful thing about programming languages in general is that a rather small set of primitives can produce a very large range of meanings. A problem with programming languages in general is that programs beyond a certain (small) size are rapidly opaque to humans, and thus lose their explanative usefulness.

But what if a “message language” was designed much much better? And the use of it was much less ad hoc than programming too often is?

Is it necessary to make a GUI version for a simple program?

In the spirit of “Context is worth 80 IQ points”, let’s try a different perspective. A UI is the meeting ground between processes, especially between humans and media (where the “process” is sometimes the description of another “process”).

To me, then, the appearance of a program itself is its first layer of “graphical user interface”: i.e. the visible meeting ground between a reader and the intended meaning of the program.

It is often the case that the writer of a program will be unable to read it some months and years in the future (i.e. programmers need “user friendly” also).

The Turing laureate Tony Hoare once remarked that “Debugging is harder than programming, so don’t use up all your cleverness in writing the program!” (You will eventually get wiped out trying to debug it (and often in the future).)

This doesn’t promote COBOL to a favored position because what its program texts describe are often quite opaque intentions. “English” that is unreadable is still “unreadable English”.

It also doesn’t automatically damn a language like Pascal, C or an assembly code. We should see however that most line by line comments only say what the line is doing (and — oops — these sometimes don’t get updated when the code is). The reader needs to understand intent and meaning.

“Bad imperative languages” can work because, although the actual text of the program can be opaque, it can be surrounded by good to great explanations, and even integrity tests. An extreme case is Don Knuth’s TEX book “B” in which the actual Pascal program is embedded in excellent “literate text”.

My own view is that language designers should deeply ponder these issues, because the less readable is the program text, the more strain is put on the programmers to reveal their intentions — and humans — and programmers! — being what they are, the results will often be ugly and substandard.

Doug Engelbart’s NLS system in the 60s was programmed mostly in a low-level form — basically a structured assembler with expressions called MOL — that tried to be a happy medium between the very slow SDS-940 and the higher level language that they would have liked to program in. They implemented their own Meta compiler-compiler to implement MOL, and they had the chops to make all their tools in a timely fashion, with net wins all around. A nice touch was to embed all of their code within the hypertext structurings of NLS itself, and its features of shared collaboration and explanation, and this provides a great programming tool along with all of its other benefits. (Hard to believe today’s generation both is mostly unaware of this and somehow is not able to carry off the tool design and building.)

To go into a more primitive area, in the early 60s most code was in assembler, on cards, and some programmers found that always using macros allowed them to create a “higher level language of intentions”. Some, like the legendary Bruce Baumgart would reimplement his macro language on each new computer he had to work on so he could port useful old code. (This was very tidy, and his code was very readable at the macro level.) Note: a problem with macros is that it is almost always the case that the expansion of the macro at compile or assemble time means that its structure and meanings are lost when real-time debugging (the debugger doesn’t usually actually know about the larger structures in code — but a richer scheme could allow this …

If used judiciously, an actual extensible language can really help. Important note: in the hands of poor designers, this can be even more of a disaster than a “regular language”. This is because — in the heat of battle and debugging — even good programmers often become poor extension designers, with a “Tower of Babel” result. This is because “extension is meta” and “meta usually has vaster implications”. OTOH, getting good at this should be a required stepping stone in the learning curves of programming, regardless of the level of language.

(I should mention that we found out about this the hard way with Smalltalk-72, which was very extensible in syntax, semantics, and pragmatics. Dan Ingalls rescued this with a great design compromise in ST-76, which was almost as extensible and more readable in many ways than the earlier version, and was superior in most important ways. One of my greatest pleasures was reading the code of the entire ST-76 system when it first appeared.

However, we were generally not as tidy as Dan was when he rewrote the system from scratch every few years — even with tools, we would often commit spur of the moment design blunders which would start to fog up the clarity of the system. Having a good debugger really helped, but our mostly highly successful experiences with this way of doing things also brought home some of the design flaws in Smalltalk itself, especially in capturing the essence of whole systems.)

It seems to me — now that we are entering the year 2024 — the whole field should be doing much better here, shouldn’t it?

---

The above was written more or less assuming the longstanding practice that code is a form (usually text) that stands apart from the eventually running program itself. But in the spirit of the last sentence, I think we should ask — in 2024 — if any program should be written in a standalone textual form. We are no longer using 026 keypunches — instead we already have interactive computers at our disposal (oddly, we seem to need to frequently ask ourselves, “how can my computer help me do what I want to do?”).

For example, to make ourselves feel safe and virile, should we use our computers to simulate the 026 (perhaps with a few improvements) to produce files of simulated card images, which then have to be compiled and loaded (usually in an early bound form)? This seems completely divorced from the actual realities of large complex systems, which — besides functionality — need integrity, safety, dynamic relations to intentions, fixability and changeability in the future, etc. Recall that all of Facebook (including the company itself) took itself off the Internet for more than a day because its system didn’t know that it was supposed to be on the Internet! (This despite over 9000 unit tests for this and that!)

image

IBM 026 card keypunch

image

*Simulated 026 punch cards (note: this editor was done in the 80s long after more capable integrated IDEs and hyperlinking had been done in the 60s and 70s — note the comment (which is not trying to be funny!) — almost makes one think that a huge percentage of programmers might be both lazy and undereducated!)*

I.e. Moore’s Law did happen as predicted (and more so) for HW from the 60s on, but almost the opposite happened for SW — Yikes!

Just to rub it in — here are some non-lazy and industrious computerists, who saw Adele’s Smalltalk book in the early 80s and decided that they wanted to do this, despite only having MSDOS and only character displays with no pointing device. But a character display is still a kind of raster …. so:

image

Here is Digitalk Smalltalk Methods from 1983, done independently from Parc, and using techniques necessary on 8-bit micros and early IBM compatible HW. It includes a complete window based IDE and debugger (if you’d like a few more early heroes, how about George Bosworth and Jim Anderson?) — from Digitalk Methods manual found online.

I.e. “We don’t need no stinkin’ bitmap!” to make an overlapping window interface, etc.

One conclusion relating to the original question is that any reasonable IDE in 2024 should automatically provide a UI for any code written. There were a number of past examples from the 60s and 70s, and a white paper that attempted to be provocative written in 1984 with the title “Opening The Hood Of A Word Processor” …

Is the most important thing for programming "readable code"? If you had to name only one thing.

I wouldn’t name just one thing: this is a bit like asking “which leg of a three legged stool is the most important one?”

For example — in the end — (b) we are always creating systems and integrating with systems. So it is really “an important thing” that the language is very good at expressing systems, interrelationships, and runnable requirements.

If we are only picking three, there are lots of “(c)”s. For example (c) safety/trust/accountability could be thought of as a “compound leg”

I would still want a (d) that has to do with “creating and controlling world-lines and histories”. This would be required for a significant part of (c), so maybe that is where this hugely important, much neglected criterion could reside …

What is Alan Kay's alternative solution for the web which he mentioned in OOPSLA '97?

—— Some additions and edits made Dec 3 2023 ——

I will confess to no longer remembering exactly what I said 26 years ago …

I was shocked that the web people and the browser people had apparently taken no heed of much better visionary work in the past, which could have made a big difference in how things went and now are.

Three big examples were (a) Doug Engelbart’s NLS system (and even more important: Engelbart’s visions about collaborations and communications), (b) Apple’s Hypercard system which was both really good as it existed in the late 80s, and more importantly: showed a path for how the web could be matured to the benefit of all users, and (c) how Postscript solved important systems problems.

Instead the WWW went for simple text based markup docs, and the web browsers were generally even worse because they concentrated on consumption rather than authoring.

It’s worth noting that the option to really share was built into the OS of Engelbart’s system (and today, 55 years after his big demo, none of the standard operating systems in use have this as a basic feature.

Hypercard showed how an end-user authoring system could be learned and used by literally millions of end-users, and it provided a model for how web authoring could be done. This was initially completely ignored by the web people, and then gradually not done well (recall the actual early history of Java, and why “Javascript” is called “Javascript”.

John Warnock and Chuck Geshke were originally at Parc, where an interesting problem had arisen regarding editing on bit-map displays and using the new high resolution bit-mapped laser printer of Gary Starkweather:

What should a personal computer send to a laser printer to get arbitrary images printed?

The first solution — a data structure interpreted by the printer — broke down when it was realized that it would be almost impossible to deal with all the possible images, to update printers, etc., etc.

However, this brought up the idea of sending a process to a virtual machine of a programming language — on the printer — aimed at making images in a resolution free manner. The process could be run safely in an address space — and use the equivalent of capabilities for accessing outside references (like the pixel buffer for the printer, etc.). The printer interpreter would know the characteristics of the printer, the resolution, the media, etc. This solution would much better match up “who needs to know what”, and allow printers to be “colonized” by the simple VM interpreters. This later allowed Adobe to quickly populate printers with Postscript.

The process is thus a kind of object which contains both code and data and is safely encapsulated.

Note that this suggests — especially within an Internet — a model for dealing with a “web” and its user interfaces. I.e. send processes rather than text, and have the common denominator start with (a) an OS to deal with virtual machines that can run them safely, and (b) to map the outputs of the VMs to the user’s screen (the equivalent of the printer). There is a big opportunity (c) to think of the GUI for this as a way to help organize interactions, but also (d) to allow the processes to do their own UIs when this is a good idea.

Note that this also suggests that peripherals — such as printers — contain the code in a higher level language for their own drivers (and can send it to user software on the personal computers rather than choking up the local OSs with tons of driver SW), etc.

The above paragraphs are the kind of thinking that systems thinkers would do …

Another comment I might have made back then is that the initial visions were so small by the web and browser people that they thought the answer was “an app” rather than a real operating system environment that could handle what the Internet was becoming (Engelbart had a much better vision before there was even an ARPAnet in the 60s).

I recall that the web people eventually revealed that they did not know about Engelbart’s previous work.

From the “practical” point of view it is very difficult to start with a weak conception and then incrementally add on features without winding up with a kludge and unnecessary complications for all. This includes really important considerations such as safety and trust (they are hard to add to an initially non-safe and non-trustworthy system). Similarly, it is hard to scale a system not designed to be scaled.

Etc.

This is what made the success of the web and its browsers so shocking. I have a shudder every time I peruse a Wikipedia article about something dynamic (like a programming language, physical system, etc.) that has no dynamic model as part of the explanation (and Wikipedia is running on the end-user’s own powerful computer!).

What part of “computer” did the web and browser people not understand? (Note that the original web was done on a Next computer with a decent UI and OS — and the later browser was also done initially on computers with actual OSs and GUIs …). My guess is that they didn’t think about the future (because any thought would have revealed that what was required was actually more than personal computing already had, not less).

A no less shocking thing today is that (anecdotal) I’ve found that most people who claim to be computerists are stumped when they are asked to criticize e.g. the browers, or the standard OSs, etc. They have become normalized to “poor” and think it is “normal” and “OK”. To me this is not only sad but dangerous.

How is math translated into code? For example, ray tracing in computer graphics, how is the math and physics equation of that translated into a code?

The answer by Jim Lux will help.

Here’s another slant on this. A lot of early computer programming was about taking mathematical formulas and writing code to compute them. One of the very first “higher level programming languages” (released in 1956) was FORTRAN, whose name was derived from FORmula TRANslation. It’s main ability was to allow recognizable arithmetic formulas more or less in algebraic form to be written directly in the code.

(I have a Quora answer somewhere about FORTRAN. Thanks to Dane Clarke, here is the link:

https://www.quora.com/What-was-the-historical-significance-of-FORTRAN/answer/Alan-Kay-11

)

This effort was successful because John Baccus and his IBM team put an enormous amount of work into not just doing the translation, but in optimizing the code that was produced to compete successfully with hand written code.

For a time, FORTRAN was actually regarded as “early AI”. Most subsequent programming languages — Algol, C, C++, Pascal, Python, etc. — have also implemented “formula translation” and similar optimizations.

---

(Added Nov 24 2023) I realized that the short answer above is a bit cryptic for the level of the question. What FORTRAN brought to the table was to have a form in which simple algebraic formulas as used on paper could be put on punched cards in almost the same form.

Note that SQRT(…) is used instead of the radical sign, and that multiplication is specific (“*”).

At the next level, most computing machines have an ADD instruction and usually a SUB tract instruction. Most will also have a MUL instruction (some will have a subroutine for this). And most will not have a SQRT instruction, and will do this via a subroutine.

The task of the FORTRAN compiler is to translate the characters in the example line to a sequence of machine instructions that will perform the indicated computation. The “variables” will be modeled by words in memory (“core” memory in those days). Most machines back then didn’t have a stack, so the compiler would also allocate temporary variables to hold intermediate results.

For example:

LDA S

SUB IC

STA TEMP1

would compute the value of the last subexpression. The other subexpressions would be computed the same way and stored in TEMP2 and TEMP3.

Then the machine code would do the multiplications

LDA S

MUL TEMP1

MUL TEMP2

MUL TEMP2

This will be the value of the parameter for the SQRT subroutine, which might be called thusly:

JMPSUB SQRT

STA AREA

which will assume its parameter is in the “A” register.

A convention on this machine might be to leave the result in the “A” register and return to the instruction word after the JMPSUB which will store the result in AREA.

Note that this code could be optimized …

Bottom line is that human conventions for “formulas” can be used in a program if there is a compiler that can translate the formulas into the machine code of the machine being used while preserving the meaning of the original goal.

---

A very early famous (and famously simple) system for translating from grammars of programming languages to runnable code is “Meta II” by Val Shorre. The wikipedia article has an explanation of it and also has links to the original papers (read this one, it is a classic).

https://en.wikipedia.org/wiki/META_II

What programming language does Alan Kay consider to be the most advanced?

I used to track programming languages as well as I could (but as Jean Sammet pointed out in the late 60s, “There already have been over 3000 PLs designed and made. How can we sort them?”).

The ACM chooses languages for its “History of Programming Languages” (HOPL) series, but only from languages that have been in use for about 10 years, have affected other languages, etc. There have been four HOPL collections of histories so far, all published as books.

I’ve given a few talks that have asked about why software and programming languages have not even kept pace with exponential Moore’s Law HW scalings, and consequent changes in requirements (e.g. security, safety, specifications, etc.).

The main events for me after I turned 80 three years ago, have been trying to avoid Covid here in London, and developing a bad cancer about two years ago (“It is what it is”, no comments needed).

But I’m pretty sure now that there could be one or two really interesting languages out there that I’m not aware of. The last time I really checked was a few years ago when I got asked to do 3 different “Software Engineering” talks at almost the same time. This got me to scurry around to contact colleagues to find new colleagues (especially in the tactical trenches) who would have more accurate perceptions of the current situation.

I found a number of top people I had not been aware of (for example, Gerard Holzmann of NASA) who had accomplished really interesting and important things. But I did not find anything of great new interest in the programming language arena. Most of the top SEs I talked to were doing projects bolted onto quite weak (and old) programming systems and requirements (for example NASA requires its SW for e.g. a Mars Rover to be done in C (!) despite that it will have to run for a year or more on Mars!). The result is that an enormous amount of computer science and engineering (and funding) is expended to take what is essentially a house of cards, and make it into a “real house”.

Universities used to be a very good place to birth new programming languages, but today’s funders seem to have little taste for funding these today — and worse, what I see (anecdotal as it may be) are many students getting trained in languages for legacy systems in business rather than being taught how to design and build new languages.

So:

The real question is not “most advanced?”, but “up to today’s actual needs?”

Is there a good list of requirements for “up to today’s needs?”

Are there even any first-try example languages that are aimed at “up to today’s needs?”

I find it hard to imagine that any reasonable candidate these days would not include larger facilities (such as active knowledge of important parts of the world that the results will touch (think Boeing 737 Max Autopilot, the Facebook and Google crashes, etc.)).

It should be clear that “something actually ‘expert’” is needed, but that the LLM ML approaches today are really dangerous without serious grounding in a cognitive knowledge-based system.

ADDENDA

<to appear>

What does Alan Kay think about programming and teaching programming with copilots and LLMs of today?

A little context and bias before I attempt to give an answer: I started programming as a job ca 1961 in the USAF. The programming style was “data munged by imperatives with loops” (sound familiar?). An interesting wrinkle was that the machines we used at that time (and most computers) did not have what would be called an “operating system” — but there were yearnings, and also interesting subsets of code that helped “operators” run “jobs”. In just a few years, as the result of many different forces, a number of different operating systems started to appear, and most of them had as one of their main goals to be able to run multiple processes, including processes written by different people for different reasons.

Big shifts in outlook started to happen, and many of them were ultimately about “trust” and “security” in “evermore complex systems of processes”. The history of this is really interesting — including that some of the best solutions were done very early in the game (some of these never got generally recognized by the run of the mill programming establishments). In order to get to the answer of the question, I’ll only urge readers to look into this “epistemological revolution that was generally missed by the field”.

Let me use trust as the central theme for this question and answer. Trust was certainly part of the criteria for a program before the days of multiprogramming, but it was basically whether a running single program could be trusted to do what it was supposed to do. Bugs were there, but only affected the program that had them — in rare cases, bugs could be propagated via “wrong data” written on tapes, etc., but in practice, almost all bugs were local, so the integrity of a program was quite a bit easier to ascertain and improve.

When the desire for multiprogramming happened, it was attempted on hardware that could not confine the different processes (sometimes an entire core memory image (usually less than 150KB) was rolled in and out for each job to and from a drum or even tape memory). This led to some hardware assists of various kinds, and attempts to have the cake and also eat it.

The bottom line today is that most computations are not as secure and trustworthy as they need to be (in part because some of the best solutions wound up being perceived as being “too expensive” and then abandoned).

Note that Trust becomes front and center as Moore’s Law advances, especially with advanced networking.

Note that Trust becomes IMO the dominant issue when all that has gone before is added to “NCANIP”s (non cognitive artificial non-intelligent processes* are allowed to run wild in forms that as Harari has pointed out “hack the language communications systems of our species”.

Note that “Trust” is one of the deepest issues — much larger than “just computing” — when not just communication, but actual education is one of the main needs and goals. We want to be able to know the degree of trust we can allow for what our own minds come up with, what we hear from others, what we read, our teachers, etc. A big deal with science is that it was partially invented by humans — after hundreds of thousands of years — by learning better methods than trusting one’s senses, or mere beliefs of one’s cultures.

A simple principle is that for most things that are automated and scaled, the trust requirements have to be vastly expanded and made vastly more strict.

---

Finally, to try to answer the question … (Summary: I don’t think it would be a good idea at all)

First, I don’t think the way programming is generally taught (as I’ve seen it in a number of major universities) is a good approach (and hasn’t been for about 60 years now). Qualitative changes of many kinds have obsoleted the general approaches, the languages, the operating systems, the infrastructures, and the ability to trust what can be found online — Etc etc.

Second, the genii has been out of the bottle for much of the Moore’s Law expansions, and this creates critical questions — as with most infrastructure issues — of how much effort should be expended to keep patching the problems vs actually addressing them with better more secure, more trusted, and more system approaches. (Putting typical businesses in charge of these tradeoffs has been an unmitigated disaster — but we can see that savvy and wisdom in government is not up to the task either — still, something like government will be required.)

(Note, it used to be part of the job of universities to not just teach/help-learn ideas, but to be both “keepers of the flame” and the vanguard for “reinventing better fire”. (Claim/guess: most so-called “computer science” departments in universities today do neither — they seem to now be trade-schools for what existing businesses/organizations have gotten themselves into trouble with.))

Third, I like to praise the (actual) Internet design because (a) it was really done well, and (b) I was not involved enough in it to claim any credit (thus, my praise should be more “trustworthy”!). A key part of their design was to not allow direct sending of commands — only bits could be sent. This means that (other) software inside each physical computer has the responsibility to interpret the bits, and the power to do (or not do) some action (note that this and other ideas of the Internet suggest ways to organize software processes, etc.).

Fourth, I do think that an expert system could be made to help teach programming much better than a book can, and better than most teachers of programming do now, so that most learners can actually learn much better than they do now.

By “learn” I mean that important processes and relationships are created between the learner’s ears by their own brain.

By “help” I mean that — especially when changes in epistemological points of view from one’s own common sense are required , it can make a huge difference to be near a “special human” whose personality is strong enough to make us rethink what we think we know. Our “trust” in this person is not necessarily that they are “right” all the time, but that “they are consistently interesting, and thus worth listening to”. Occasionally, we can have that kind of trust in an author, but it is often all too easy to be lazy about rethinking …

Fifth, I think ML and LLMs, etc., do have a part in “actual machine intelligence” in the future, but not nearly the most important parts. I don’t think the systems exhibited today are to be trusted at all (and note that using such a system to provide explanations for what another such system has concluded, is just piling BS on BS). That humans also do this all the time is “interesting”, “dangerous” etc., but it is also why trying to move from superstition (this is actually what “reasoning by correlation” amounts to) to more scientific methods is critical for anything like civilization to be created.

Finally, I think the big deal here is vastly larger than the vast disasters in computing systems at all levels. If we look at human anthropology, we see a species that treats solitary confinement and banishment from society as punishments — we are a society because we can cooperate and trust a little — but when are safely back in society, we start competing like mad (and cheating like mad) as though the society is there to be strip-mined. Scaling and technique has made this a “Spaceship Earth” issue that can no longer be put off and ignored and violated.

Robert Heinlein once warned: “The bull wears itself out on the cape and fails to see the sword”.

What is machine code? Is it really the lowest level of programming? If it is, how does this make sense as we still have assembly languages and high-level languages like C++ and Java?

This is a pretty good question.

This answer is a bit more detailed than the others (but read the others, they have useful information).

In “the olden days” of the early 60s when I was a journeyman programmer, we would often call the lowest level in which the machine could be coded “absolute code” (or “code absolute, or “octal absolute” or “binary absolute”).

This is because Maurice Wilkes had invented “microcode” as a very good way to deal with the conflict between (a) what is called the “control problem” of a programmable computer, and (b) the desire for the machine to have a number of useful fast instructions, whose “control” would be “un-fun” in just logic gates. Some computers would have microcode and some not — all needed some form of control logic to route information (usually binary bits) around the machine from functional unit to functional unit.

An early fun thing to learn about computers is that every part can be made from a single kind of building block — e.g. either a NAND or NOR gate. All other logical functions can be synthesized from these. Sometimes for convenience, three logical functions — AND, OR, and NOT (which can be made from NANDs or NORs) are used to help explanations.

Because there is a lag in the operation of a gate — meaning time is also involved — it is easy to make a “clock”, which can then be used to help with the scheduling of events. For example, if we put a signal through a NOT gate the result will be inverted, and if we feed the result back to the input, we will get a train of digital pulses that are the “ticks” of the “clock”.

The simplest things made from primitive gates are units that will add, compare, remember, “decode”, clocks, etc. These are analogous to a railway switching yard with trains that can move on tracks. The “control” part is usually made of humans in coordination, routing the trains by switching switches, etc. The “control” of a CPU is logic that will route the “trains of bits” around the switchyard for various purposes.

In a long ago microcoded machine, a microcode “word” is a sequence of 1s and 0s that will open and close gates that will act as the routing switches. Usually the microcode words are looked at in sequence, and there can be branches, etc. The earliest versions would use something like diodes to represent the 1s and 0s, and the microcode would be wired in. This would be programmed by the manufacturer and would not be changeable in the field.

image

A “diode matrix” ROM for holding microcode (you can see these were simple, fast, but painful to fix bugs/modify, etc.).

What a regular programmer thought was “code absolute” would be the next level above this, and this would be fetched from main memory, held in an “instruction register” and carried out by the underlying microcode (plus some shortcuts of various kinds). (For example, the operation field in the instruction word would often lead to a particular microcode word and the microprogram would start executing there.

What ever the level of coding, most programmers would not write their code in terms of raw bits, but would use a “symbolic assembler” to help. These would most often have a line of symbolic code for every configuration of bits. (I wrote a Quora answer that showed some of Margaret Hamilton’s Apollo assembly code — the listing also shows the actual bits.)

However, it was realized early on that it would be just wonderful if you could dynamically load microcode! This would allow you to not just customize your machine, but to create new machines as needed. The catch was that the memory for the reloadable microcode had to be as fast as possible — faster than main memory (at least 5 or 6 times faster). This would allow the “inner machine” to cycle faster than the main memory could cycle (and thus it would look like HW rather than SW).

A fun machine was the Packard-Bell 440, which had an interesting compromise: the lower thousand words of main memory were done in terms of much faster thin-film technology (vs cores for the rest of RAM). On this machine you would load in your most used subroutines into the fast memory, and they would act like microcode instructions, etc.

Some machines had “diode boards” which allowed microcode to be switched out by hand.

An interesting “beast” was the Control Data 6600 whose “regular code” was essentially microcode, and the poor programmer had to work at a very tough primitive level to match up the fast CPU with the slower (but banked) RAM scheme.

The Xerox Parc machines all had reloadable microcode, and this was used heavily, and worked very well.

The reason for all the different kinds of languages (above the “bits”) is to help our limited human minds grapple with the problems to be solved with the least amount of interference from complex or tricky features the machine might have. (The state of this art has hardly advanced in the last 40 or more years — and this is not a good thing at all, and results in much too complicated SW.)

---

Added Sept 12 2023 — a story that might help. One of first classes in grad school in ‘66 was “Computer Architecture” by Dave Evans, who had actually done two Bendix computers while a VP there (the G-15, and G-20).

On the first day he gave us the specs of an architecture and a problem that needed to be solved. Each of us had to simulate the architecture to make a running emulation of the computer, and then solve the problem by writing a program that the emulated architecture could run successfully to get the desired result.

The only reasonable and reliable vehicle IMO was a FORTRAN that ran on the university’s Univac 1108 (which was a batch computer — so you had to submit cards and wait for a run to see what went wrong). Some of the other students actually used the 1108 assembly for this but this was just making more work for themselves.

However, the double debugging problem here — both the emulator and then the emulated program running under it — motivated me to write an actual assembler (also in FORTRAN) for the emulated machine. This was an additional pain, but wound up paying dividends in getting the whole system to meet Dave Evans’ requirements without burning out too many brain cells.

How did the first computers look like and what could they do, compared to modern-day machines?

The question explicitly says “first computers” — so, for anyone with a taste for history, this should rule out answers for computers that happened after 1955 or so, and I will concentrate on “first programmable computers”.

One nice part of the answer is that — with enough memory (e.g. many many magnetic tape drives) — the early computers could carry out literally every computation that today’s computers can. The results would take much (much!) longer to be formulated. This principle is due to Turing in the mid-30s.

In practical terms, many (most) of the things we do with computers today — though computable by the old machines — would not be feasible (humans live only so long, can wait only so long for interactions, etc.).

A fun computer system from the mid-50s was for the SAGE air defense system. It required an enormous concrete blockhouse as its “case”. The bottom floor was just for the power supply. The next floor — about the size of a football field! — contained “the Q7 computer” (actually a ganged together double machine with more than 50,000 vacuum tubes and a physically enormous core memory (the cores were almost 1/2″ in diameter). The next floor up was “operations” and the top floor typically held about 150 graphics terminals with pointing devices.

About 24 of these “blockhouses” were built. The last one was decommissioned in 1982!

Here’s a picture of one of the blockhouses:

image

Just to emphasize again: it does not take much of a CPU plus a lot of simple memory to allow any and all computations to be expressed.

Does Alan Kay have any updated thoughts on "People who are really serious about software should make their own hardware"?

In brief: I’ve pointed out — long ago now — that “hardware is just software crystallized early”. If we combine this with why we want computers that can take on and run software (so we don’t have to go to the trouble of making special hardware for each functions), we can see that the slogan is actually requesting more reasonable tradeoffs.

In the earliest days of designing computers, the trickiest part was called “CONTROL”, the logic that would read an instruction and open and close pathways to execute the instruction. The great Maurice Wilkes came up with the idea of microcode — a fast machine within a machine with really simple fast instructions — to regularize as much of the CONTROL as possible. Often, simple diodes were used for the (read only) control memory.

It was early realized that it would be “really nice” if the microcode control memory were dynamically reloadable, because this would mean that the computer would be nicely “parametric” — one could customize it as though it were hardware. The tradeoffs here are many, but the main ones often had to do with how fast was the fastest memory the microcode could cycle, vs. how fast was the main RAM (which was magnetic core storage or worse in the old days).

The other parallel line of thought which led to the quote above came from early attempts to make custom hardware for special functions — a main one was to make machines much more suitable for running very high level languages (the nicest and greatest one was the Burroughs B5000 by Bob Barton). The bottom line on this machine was that “it was the most wonderful thing ever” (but) “it was inflexible to new software ideas” (and so probably should be microcoded to get the best of both worlds). Barton’s next work for Burroughs was just that.

We (at Parc) had big goals, but wanted them to run in personal computer sized packages. To us this meant the small machines had to be microcoded (and the budgets meant, the design had to be really good). Chuck Thacker was the main genius who accomplished this.

Also in the 60s, Utah was an ARPA project mostly engaged in inventing what today we call CGI (continuous tone real-time 3D graphics — and 2D graphics along for the ride). We took a mathematical approach and designed special hardware to do the extensive computations involved. Jim Clark, who was part of the project, started looking at a single chip design that could do the computations. This was the first GPU.

Etc. This led to the mantra above.

It’s worth noting that most of computing persisted in a dumb belief: that Moore’s Law would always catch up to special hardware (generally true) and therefore it is not worth it to design special hardware (generally false). We tried to explain that if special hardware is going to give you (say) a factor of 5 or so over regular CPUs then you will be gaining quite a few years before Moore’s Law can equal it. Those few years are critical for research, and resulted in much of the hardware forms of today.

The other thing we tried to explain back then is that if you have new ideas and try to use old hardware, you are likely to spend a lot of your effort just doing work arounds and optimizations that are just wasting your precious time.

I think this is still the case today.

What were the main ideas that eventually led to the design of modeless human computer interaction and modeless interfaces?

The other answers are pretty good, and I’ve already answered this in Quora somewhere else (see if you can find it), but just to be brief here.

The most important idea was that “modes are useful, until you want to do something else”. For example, inserting text, or working on project A. It’s when you want to change what you are doing that you forget you are in a mode and get pissed off that you have to remember to get out of it before you can do the thing you really want to do.

So “modeless” means that you can start the next thing without explicit exits from what ever you are doing and the system will clean up after you.

This idea originated at Parc, but was influenced by systems such as RAND’s GRAIL pen based system in which quite a few interactions were “modeless” in the sense above.

At Parc, the overlapping window idea I had in 1971 had this character as well. If the window was of a process, then you could touch another window, and that would automatically suspend the old process and start up the new process in the new window. This would allow you to move around at will, etc.

Both Larry Tesler and I were working with text editing and realized at about the same time that you could avoid being stuck in a “replace or insert mode” if selections started and ended between characters. This would allow selection of 0 to n characters, so all input actions would be “replace” and you could always select other places without having to get out of what you were doing.

This was a lot like the window idea and some of the good GRAIL ideas, etc., so we decided to try to do everything in such a way that new actions would automatically suspend old actions, etc.

This is what we called “modeless”. Larry eventually had a license plate for his car saying “NO MODES”

Why does Alan Kay call his youth misspent?

That was a joke about how many things I did that weren’t computing.

Actually, pretty much everything I got fluent at as a “youth” wound up contributing to my later thinking and results as a computerist. For example, I got degrees in both pure math and molecular biology (with minors in English and Anthropology), was a pro jazz guitarist for a number of years, did a lot of most aspects of theatrical production, served in the US Air Force, read a zillion books, etc., and too many more.

When I accidentally wound up in an ARPA research community grad school (U of Utah), what the community was doing was so interesting that I started focusing on it, and using my “misspent youth” to help (for example, the theatre and anthropology were great helps in user interface thinking … math and biology helped with making a kind of “object-oriented programming”, etc.)

I should point out here that “there was no plan, grand or otherwise” — I was simply following my nose. I did decide to get fluent in each interest, but I had no vocational or other plan. I was just “happily curious”. I had no money, so I had to scratch to support myself for all of this, but a “starving student” (even almost literally) on a university campus is surrounded by the riches of the ages to learn and use.

One way to look at all this is that the more you get fluent in, the more ways and things you will have to make analogies to, and these can lead to very different important new thoughts.

When Alan Kay conceived of object-oriented programming as being about objects that hide their state and send messages to other objects, how large were these objects imagined to be, and how closely does current teaching and programming reflect this?

Summary: they were always thought of as “all sizes” — this is what messaging allows one to think — but it took awhile to invent all the software engineering needed to make the nice idea practical enough for real system building.

Though I had been a journeyman programmer for a few years, and had seen part of the object idea a few times (B220 file system, B5000, etc.), it was Ivan Sutherland’s Sketchpad system ca 1962 that got me thinking about modeling and seeing the first Simula a few days later (how to do something like OOP by instantiating Algol-60 blocks) that got me to see the analogies to (a) cells in Biology, and (b) algebras in math.

I was in a maths mood at the time, so it was easy to see that a virtual computer could model anything on a computer — no matter how large or small — and that they could be recursively combined using messaging. This provided a really simple framework of VMs (with the internals being VMs) all on a point to point network (like the logic of the later ARPA and Internets).

The state of software engineering in 1966 was not advanced to either the very large or very small idea being rendered as an object. Most of the things “like objects” had a fair amount of overhead, and were thus rather large entities.

But the “maths” idea was too nice to give up. Meanwhile, I encountered LISP, which was not just a “neat mood” but an actual example about how to do both some of the maths and some of the software engineering needed.

For example, one way to look at passing a message is by thinking slightly differently about APPLY in the context of an FEXPR LISP. Another example is that most LISPs since 1.5 have mapped small integers into the protected LISP “entity space” (and the B5000 maps all numbers into the protected address space, etc.).

The FLEX language (for the FLEX machine of the late 60s) borrowed more SW ideas from Wirth’s Euler than anywhere else, and also used ideas from Shorre’s Meta II and the Floyd-Evans method of parsing.

Dan Ingalls was my indispensable partner for the Smalltalk project at Parc. The first Smalltalks were rather Lisp like, and then — as we got more skilled at the SE, were able to use ideas from more sources (including the FLEX machine) and from original inventions.

We could have done a bit more for arbitrarily large objects, but — since our goal was inventing modern personal computing, etc. — we mainly aimed at “clean, small, simple, and powerful” for Smalltalk. This worked out well for our project at Parc.

However, it’s worth emphasizing that messaging and late binding allow pretty much all improvements within this kind of architecture to be done while the system is in use (and without having to be stopped in order to make any changes, fixes, or improvements).

Note: Instantiation of an “idea” is a very good idea. “Inheritance” needs to be really controlled in order to not turn into a nightmare (this is why I left it out of the first Smalltalk — Simula I didn’t have it, and Simula 67 did). It was pretty easy to see that “some things could be accomplished” with not just inheritance but multiple inheritance, but the potential for mess was huge. Mathematically, it is a good thing to contemplate — but as a large idea that is part of questions about abstractions.

I’ve noted elsewhere that “object-oriented” to me is not a programming paradigm, but a definitional scheme — it’s a way to making things that carry their definitions with them. A programming paradigm needs to include enough design principles to guide designers though complexity and scaling. Messaging between instances of ideas is part of this. Experience would lead me to look at compositions before inheritance (the PIE system of Goldstein and Bobrow) used Smalltalk in an interesting way to do this. I think I’d look at “modeling time” before looking at inheritance. Etc. But I do think that establishing a tight enough meaning for inheritance could add more clarity than murk. I haven’t seen a good example of this.

We got away with using inheritance in the later Smalltalks at Parc because we — and especially Dan Ingalls — were very careful with it. A later experience with Morphic in Squeak was not so happy.

Why did Alan Kay move to London?

Short answer is that my wife — the amazing Bonnie MacBird — decided she wanted to write a Sherlock Holmes novel in the authentic voice of Doyle’s Watson. She had been a super fan since childhood, and this project had been on the back burner for years.

As she started off, she had the idea that it might be fun, motivating, inspiring, etc., to do some of the writing at the “Sherlock Holmes Hotel” on Baker Street, London. It turned out to be nicely funky, an establishment on different levels that had been stitched together into a hotel many years before.

And it was right in the middle of one of London’s many nice “neighborhoods” — Marylebone — with nearby parks small and large, a “high street” with “everything”, the Royal Academy of Music, Wigmore Hall (one of the best sounding in the world for ~550 people audiences), Baker Street Station (which is a main hub with many underground lines), in walking distance of Oxford Street, etc. Many of the buildings around the hotel are Victorian flat “blocks” that were built in the 1890s, Holmes’ fictional era.

And there is the larger London, which includes some of the very best English language theatre in the world, and in great profusion. What had years before been very spotty food, had been converted, partly by the influx of other cultures including Indian and French, into more good restaurants than one could sample.

Bonnie’s writing was going well, and we kept going back. I made up a “Hauptwerk organ keyboard and pedals” that the hotel stored for us, so I could play and practice while we were there.

I should mention something ephemeral and hard to pin down here: but we — and I think most Americans — can detect a palpable “spirit of kindness” from most people in London. Teenagers will actually give up their seats on the underground to older people. If someone sees you need any help they will come and offer, etc. The “feel” of London is a good feel.

At one point, I noted that “If we were to live in London half-time we could break even on a flat rental for what the hotel was costing us”. We looked around a bit and found a nice little Victorian era flat across the street from the hotel, one block off Baker. This had a double “sitting room” and Bonnie fixed hers up to be the 221B sitting room of Holmes and Watson. Here’s a view out the window during a very rare snow:

image

Our flat building is like these. Our small flat reminded me of my student days, and we both have liked it a lot.

Then came the shock of Trump’s election, followed by the pandemic. We almost flew back to the US for the pandemic, but decided that both the UK and the US would botch it, and that we’d rather try to sit it out in London than in LA. This turned out to be a good choice.

During all this Bonnie had a contract with Harper-Collins and has written 5 highly regarded and reviewed Sherlock Holmes novels while we have lived in the flat. The Internet and wireless are good in the flat, so we were very much in contact with stuff we were doing, especially as the pandemic took hold.

Then I wound up getting a very rare and bad kind of cancer, and one of the world’s experts happens to be in London. Before 2013 there was no treatment, so I’ve been lucky to have the new “immunotherapy”, which a small percentage of the time will delay matters if you are in Stage 4. There are side effects, but the cancer itself has been delayed so far. At my over 80 age, this is a “cycle of life” kind of deal, so I have no complaints.

So: “London” was not a plan, but just gradually happened, and we are both happy how it turned out.

Why did you major in mathematics?

I always was interested in math, and especially after reading in a Robert Heinlein scifi book for youngsters that “No matter how or where you wind up you should understand mathematics”

I never had a plan to be a mathematician (I think), and the other of my double major in college was Molecular Biology (with two big minors) and I worked my way through school and bed and board as a journeyman supercomputer programmer in the mid-60s.

I also delved deeply into too many other wonderful things as an undergrad, music, art, theatre, etc. so when I graduated in 1966 I was too exhausted for anything. I couldn’t stand the idea of right away going to grad school in either of my majors, and I knew I didn’t want a real job them — if ever.

So I moped around for a few months until a thought popped. I could program, but like most programmers I knew almost nothing about “computing”. The thought came “why not spend a year relaxing getting a Masters in computing?” That should be easy, fun, and restorative.

However, I loved living in Boulder, and the University of Colorado did not have a Masters in computing. So I went to the library and delved for places that did and were above 4000 feet altitude to have that mountain air (Boulder was about 6500).

There was only one — at the U of Utah across the Rockies on the other slope. I sent off a letter not expecting anything, and to my great surprise was accepted.

I arrived “knowing nothing” and found that I’d wandered into the ARPA research community and one of its best venues.

Since Smalltalk programming language has roots in biology, is it a perfect match for various computational biology applications(like system biology, agent-based modeling)? If so, what are some good examples of existing applications?

Smalltalk does have some of its roots in biology — but certainly when I was thinking about this kind of language in the mid to late 60s, I was not using much — if anything — of what I understood about molecular biology (a really early book on Mol-Bio was Watson’s “Molecular Biology of the Gene” in 1965 — which was wonderful, but at scalings that weren’t very applicable to computing at that time).

The parts of biology that had a large influence on me at that time had to do with the immense scalings of components and interactions, both within cells — in the millions and billions — and in multi-celled organisms — in the millions, billions, and trillions of cells.

I was very interested in developmental biology, but thought how it went about it was beyond the scalings of computation back then. What were really interesting — and applicable — were (a) the systems integrity involving even trillions of components (large things could still work after decades) — and (b) that the scheme involved just one set of DNA in each cell, and parts were controlled and expressed to get about 250 “kinds” of cells to develop and interact as tissues and with other types — and (c) that there were many levels and kinds of “noise detectors and removers” — including at the cellular level — that were sufficient to quasi-stabilize the whole “mess” — and (d) that most of the metabolism and contents of each cell were used for local stability — very little was used to contribute to the whole organism.

I was also influenced by the “polymorphism” of the endocrine system — but this didn’t get developed as much as it could have.

There were already quite a few “module schemes” in computing — most of them large and with overheads. I greatly desired everything — including “small things” to be a ramification of one kernel type, that could ramify out to the number of types convenient for an organization. (I got sucked in by math because I could see (via McCarthy) that this could be done “from out of almost nothing” — and this is what we did. Later, I realized that putting a lot of effort into designing a capable “Class object” would have been a better path.

I think Smalltalk is still worth understanding. However, I personally would start from scratch with a new language that is much more in accord with “what is actually needed?”. I.e. it would also have modules, but they would be rather different from Smalltalk’s of more than 50 years ago now.

Alan, you have written numerous essays and given many talks on the topic of computing, but you have never published a book on the subject. Is there any particular reason behind this decision? Is it related to your idea of the Dynabook?

Probably the simplest answer to the question is that I like to work on things I have “urges” to do, and I’ve never had the “urge” to write a book.

On the other hand, I do have an urge to answer questions, and Quora reveals that I’ve written 607 answers thus far (so this is answer 608).

I think that I average about 500 words per answer, so the Quora corpus would be a thick book of over 300,000 words as a kind of kaleidoscope of paragraphs and topics.

This wouldn’t be much of a book. A similar process with talks would yield another kind of kaleidoscopic — not very useful — book: an hour’s talk at about about 135–150 words per minute is about 9000 words, so 10 talks would be a “book” (and I think I’ve given hundreds of talks over the years). And similar arithmetic could be done on written papers and articles.

A big deal is summed up by my “one sentence bio”: No one owes more to his research community than I do.

In other words, what I think of as my “actual work” over my life has been done within communities of like interest and directly with major contributions from great colleagues.

Two of my favorite books of “real substance” are Newton’s Principia, and The Molecular Biology of the Cell, by Bruce Alberts, James Watson, et al.

The first is an incredible tour de force in all dimensions and a kind of work for which I’m more than a few notches short. (I do love this book!)

The second book is not just long and great, but spectacularly keeps in mind what its readers need to be “told as prep” — for example, there is a wonderful early section on the chemistry (and atomics) of water which would not be in most biology books, but is included in TMBOTC because most readers will likely not remember the details well enough from their previous chemistry classes. Having a strong visualization of what water does is crucial to the larger understanding of how life works, and this book takes the trouble to make sure you have it in mind as you read on.

The book was first published in 1983, there are now 7 editions (to try to keep up with the explosion of knowledge since then), and the authors periodically meet to work on the needed new editions. (For those who are interested, one of the classic editions was the 3rd …). How they do it is a mystery, even to them!

I think we can see real analogies to computing here — and also see many areas where computing falls short of more established real disciplines of Physics, Chemistry, Biology, etc.

Something interesting and deep to ponder about Biology, is that (a) on the one hand “Biology means variation” — evolutionary processes find many viable organizations of atoms and molecules to make up living entities, and (b) on the other hand, “Biology means speciation” — a larger look than at the variants reveals amazing stable and conservative patterns and building principles, many of the most important are hundreds of millions of years old, and some billions of years old.

The latter means a book like TMBOTC can be written in “only” 1000+ pages and successfully provide a very strong view of the most important building blocks of living entities. The former allows many different examples to be given within the larger framework.

A side comment is to note that the advantage of a physical science is that it has to be based on observations on existing entities. This really helps the start of choosing useful perspectives on the subject.

I think such a book is possible about computing, but no one quite like the amazing Bruce Alberts has stepped forth and mobilized the group of authors needed to pull it off. I think I could be one of the authors — that would be difficult but rewarding — but I couldn’t play Bruce’s role.

In any case, I’m now well into my 80s, and with a not good type of cancer which saps energy. So this is a cop out for me now, but it’s a real one.

A key corollary to the contrast with nature and physical science is that the synthetic — artistic — creation and construction in computing gives rise to way too many perspectives compared to examining nature, and I think most of these perspectives in current day computing are too weak and limited. This makes most books about computing very disappointing to read (yet think about how much work had to be done to make a book at all!)

A “really good book” about our field would need to take a deep and reasonable stand about computing to gain enough context to say much of actual substance.

---

P.S. I now realize I shouldn’t have mentioned the cancer, and just should have said, “not the energy I used to have”. I sincerely thank the commenters who have wished me well — but let’s get past this if we can: it’s just a “cycle of life” thing that is part of an inevitable process for all human beings.

Does Alan Kay see the educational goals behind the Dynabook concept furthered by AI systems, especially in light of developments like GPT-4? Can this improve computer interaction and computer aided learning?

One of the earliest stated goals — in the late 60s — for real education for children, perhaps helped along with a “Dynabook”, was: to have children grow up to think better than most adults do today.

Most of the time we thought of this as part of a vastly better system and concept of schooling, not as a replacement for what we thought was a currently broken schooling system. This was the case even though I had personally learned quite a bit from reading on my own, I still had quite a bit of help learning to read early, and there were adults around who could be asked questions and often would point to something useful to look at further. A well written book could also do a little of this if the author anticipated some of the questions that the writing would raise, and included further directions as part of the writings.

Another part of thinking about what it might mean to be “better educated” was “epistemological” — one’s way of looking at the world of “out there” and the world of “our opinions” could and should be vastly changed as “education” progressed. An early champion of this idea was Francis Bacon, who called for a “new science” whose methods could help us get around the terrible thinking done by our “bad brains” (what he proposed was what we just call “science” today, but applied to everything, not just the study of external nature, including biology, but also of our own thought processes, and how they could be made less vague and error prone).

We also correlated “civilizations and civilized thinking” with “literacy” — the correlation is definitely there — but it wasn’t until some years later that the breakthrough work of Scribner and Cole — “The Psychology of Literacy” — showed very strongly that literacy alone isn’t enough to do the job — there also has to be present certain kinds of structured schooling in order for reading and writing to do their magic. So, in the early years of the Dynabook idea, we were working on just a part of the idea, that: there is a new kind of reading and writing brought by computing — which includes the old, but adds new very important properties — and there is something like a “new literacy” yet to be discovered, which can have a much more powerful and beneficial effect on civilization than the reading, writing and literacy of the past.

Most of our designs and systems we built for children were not good enough: it took about 25 years to finally come up with a set of ideas and processes that could carry “a new literacy” — and by that time we had gotten a lot more sophisticated about the difficulties of having good new ideas actually having good new effects on children’s thinking and their having fun doing so. This required quite a bit of work with actual schools and schooling.

I think we were able to do an “illustration of concept” — using our system “Etoys” as employed in the Open Charter School in LA in the “noughts” — but the next phases were not done (mostly because of lack of “good funding”).

So an important part of the question here is whether we learned enough to answer the question in any reasonable way.

I think one opinion can be given, and this is that putting children in environments that are devoid of “real and sufficient meaning” — whether in homes, schools, books, or advanced technologies — and especially any environment that acts like a “Skinner Box” is a terrible and unethical idea.

The current round of chatbots are not close to what is needed — and note that far too many adults are no improvement, and are often even worse. And the assumptions behind most schooling today are quite terrible for helping children grow (and with or without technology). Much deeper views, understandings, goals, and visions are required.

After trying to understand this area and find ways to get it above the “What is actually needed?” threshold for more than 50 years now, I still think (a) that Bacon’s insights are valid, (b) that really good schooling with the aid of really good technologies is possible, and (c) that the current confusions about all of this out in the world are worse than they were when the Dynabook was thought up in 1968.

P.S. Much the same kind of response could be given today about questions regarding “What happened to Doug Engelbart’s real insights and goals?”

What is the specific reference for Alan Kay's discussion related to people in business not wanting to hear complaints and why they should welcome them?

I don’t recall saying this.

What is the meaning of Alan Kay's quote "the best way to predict the future is to invent it"? How can this be applied to any field of work?

This came out in a fit of frustration during a meeting in the early days of Parc with the Xerox corporate planning group.

I wasn’t trying to coin an aphorism, but this accidentally happened.

I happen to really like aphorisms — I think of them as a “special domain of prose that is also poetic”. They seem to be trying to tell the truth in a compact memorable way, but I think of them as trying to point the receiver in a direction of more enlightenment.

As with poetry, too much careful analysis can kill the “glow” that is shining in a direction.

Another one of mine that has made its way around the Internet is “Point of view is worth 80 IQ points!”.

(Note that the sign bit is not specified, but needed — but would kill the glow of the larger idea.)

The big idea is to differentiate between “reasonable and accurate thinking” in a random typical context, vs. “the same level of reasoning” in a really strong context. This came from trying to get audiences to realize (a) they were even in a context, and (b) that everything changed when science was invented as a new context to reason within, and (c) that most of our improvements from 200,000 years ago were via gradual cultural learning and teaching, and that in the last few thousands of years were via “inventions of new methods” that could piggy back on our language and cultural apparati.

I once started a talk with the easiest way to predict the future is to prevent it! This is what most people and organizations do. Then went into the aphorism above.

But then — this was at NIH in Wash DC — reminded the audience that the town was full of politicians “inventing the future” (but inventing terrible versions for us).

This is why I think a good aphorism is working when it draws attention to thinking deeper — taking even good ones literally is at one’s own peril.

However — that said — here’s perhaps my favorite one (it’s from Einstein):

As far as the laws of mathematics refer to reality, they are not certain, and as far as they are certain, they do not refer to reality.

This one really gets to the heart of many important matters — far beyond both math and science — and works extremely well just as he stated it.

What does it mean to be an "Alan Kay" programmer? What are some ways to become more creative in programming?

An “Alan Kay programmer” is a person with a few odd, potentially worthwhile, ideas, and deep enough knowledge and skills to work with a group of people with much deeper skills.

For example, this question would never have been asked if I hadn’t started working with people like (especially!) Dan ingalls, Ted Kaehler, etc. I could program pretty well — especially small mathematically dense stuff, that might have to be in microcode — but pretty much everyone I worked with could code rings around me — and I wanted them to! Sometimes I’d code an idea I couldn’t get anyone interested in — knowing that they would most definitely rewrite it to be much better.

Etc.

This group entity idea was not uncommon in the ARPA/Parc community. Even Ivan Sutherland, whom I consider to be in a qualitatively higher level in most things, got a little help to do the amazing Sketchpad, and very much preferred to subsequently set up groups for really hard really interesting problems (and this despite that he was extremely self-contained).

Bottom line: if you can achieve synergy in a group you will usually get exponential factors of improvement.

Do you agree with Alan Kay's comment that a computer science degree teaches students how to write bad code?

I never said that — we should at least pose an accurate question.

One of the (many) things I have pointed out is that a computer science degree is no guarantee of “the kind of perspectives that are needed for (good) computing” — this has parallels with the larger idea that a university degree is no guarantee that the recipient is now at a “what is actually needed” level of learning and understanding.

I’ve also said that I nonetheless advocate going to university if possible, because there are generally more chances to encounter ideas that will trigger real thinking and real desires to learn.

---

P.S. by accident, and thanks to Richard Strachan, I found the below from Kurt Vonnegut right after writing the above (to me this is priceless!):

In 2006 a high school English teacher asked students to write a famous author and ask for advice. Kurt Vonnegut was the only one to respond - and his response is magnificent:

“Dear Xavier High School, and Ms. Lockwood, and Messrs Perin, McFeely, Batten, Maurer and Congiusta: I thank you for your friendly letters.

You sure know how to cheer up a really old geezer (84) in his sunset years. I don’t make public appearances any more because I now resemble nothing so much as an iguana.

What I had to say to you, moreover, would not take long, to wit: Practice any art, music, singing, dancing, acting, drawing, painting, sculpting, poetry, fiction, essays, reportage, no matter how well or badly, not to get money and fame, but to experience becoming, to find out what’s inside you, to make your soul grow.

Seriously!

I mean starting right now, do art and do it for the rest of your lives.

Draw a funny or nice picture of Ms. Lockwood, and give it to her. Dance home after school, and sing in the shower and on and on. Make a face in your mashed potatoes. Pretend you’re Count Dracula.

Here’s an assignment for tonight, and I hope Ms. Lockwood will flunk you if you don’t do it: Write a six line poem, about anything, but rhymed. No fair tennis without a net. Make it as good as you possibly can. But don’t tell anybody what you’re doing. Don’t show it or recite it to anybody, not even your girlfriend or parents or whatever, or Ms. Lockwood. OK?

Tear it up into teeny-weeny pieces, and discard them into widely separated trash receptacles. You will find that you have already been gloriously rewarded for your poem. You have experienced becoming, learned a lot more about what’s inside you, and you have made your soul grow. God bless you all!"

Kurt Vonnegut

image

How did Alan Kay recruit and interview his researchers at VPRI?

I just noticed that I have been misreading this question — or it got changed? In any case, my answer below was about my ARPA and Parc experiences. I’ve added a little bit about Viewpoints Research Institute (which I didn’t set up until the early 2000s), and put this below the main answer

---

Parc grew gradually, in part because — as Bob Taylor told Xerox, he “didn’t plan to hire any ‘good’ people”. This shocked them and he explained that he was only going to hire ‘great’ people, because you can’t really approximate a great person with any number of ‘good’ people, and that ‘good’ people needed to be managed, and he didn’t want people who needed to be managed, etc., and this all meant that it would take a while to staff up Parc (and that computing research at Parc would be relatively cheap (because of the scarcity of ‘great’ people)).

And then a tremendously lucky thing happened. A business downturn put the newly formed “Berkeley Computer Corporation” (BCC - making a new kind of time-sharing computer) into financial collapse, and Taylor was able to hire the entire 8 or 9 people — these included Butler Lampson, Chuck Thacker, Peter Deutsch, etc., (i.e. a whole group of actual ‘great’ people), and in a flash, in Jan 1971, Parc computing research had a critical mass just a few months after starting up.

I was consulting there at the time, and soon after called up my friends at CMU to ask forgiveness for breaking my promise to go work there in a few months — I said, “with Butler and Chuck and Peter and Ed Fiala, etc, we can now do anything”. Part of the key here was the unique brilliance in all directions of Butler.

Still, after this critical mass windfall, what Taylor predicted happened. The hiring process at Parc was slow for both Taylor’s reasons, and that he wanted the existing staff to all really want (really really want) the next person to be hired. Taylor’s idea was that it was worthwhile to put a lot of effort into this, because it would pretty much eliminate rivalries when the new person came on board. Of course, we complained to Taylor about the time taken for this, but he was completely right (as usual).

But because of the BCC influx and the few who were already there, lots of really good stuff got started. There are a few interesting digressions about how the small headcount and the way Xerox treated headcount, that shaped Parc culture, but I’ll avoid them here.

As for myself, and the group I put together, I’ll oversimplify (too much) and just say “by literally talking and literally handwaving” (the hand usually had the cardboard model of the Dynabook I had made a few years before after meeting Seymour Papert.

Some of the people who were attracted were already working for others at Parc, and others came as the result of talks I gave around about “destinies”. I was only interested in those who got “stars in their eyes”, and I figured I could help them fill in the rest. This also took quite a while.

However, I got really lucky early on in a casual hallway conversation that Dan Ingalls — who was working for someone else — got “stars in his eyes” and so did Ted Kaehler. The impact these two had (for the rest of my life) cannot be overstated or overpraised. Similarly, John Shoch was a student in a class I taught at Stanford, and he was really interesting. I invited him to do a summer internship and he never left (and wound up being a major factor in networking of all kinds at Parc).

He thought I’d be interested in meeting Adele Goldberg, who was at Stanford doing a project with Pat Suppes, and “to meet Adele, is to meet a truly top-class multidimensional force of nature”, so I was very fortunate to hire her (she also had “the stars”, etc.). Adele brought Steve Weyer along from Stanford.

Diana Merry was a PA for the head of the Physics Lab at Parc very early on, and I happened to see her do something with a time-sharing typing service, that required a “programming kind of thinking”. I said to her, you could be a good programmer, so I hired her away and taught her a few things (which she learned quickly and starting doing helpful things with Dan and Ted). In addition, Diana turned out to be one of the great “bulldogs” of all time — nothing daunted her.

Dave Robson was a student (I think at UC Irvine). He had heard of Smalltalk and got quite far by himself trying to reverse engineer it (just from vague descriptions). We invited him to be a summer intern and basically never let him go back.

So the summary answer is “pretty randomly”, and often via processes like temporary internships that allowed us to see how much “depth and fun” a person might be. There wasn’t a big plan about it. An interesting thing about this group was what I have to call “love” — the people really liked each other, despite the huge differences in personalities and kinds of expertise they had (and this feeling persists 50 years later). This allowed a lot of ground to be covered.

If we cycle back to “ARPA ways” we will find a very loose organization that was not really “hierarchical” nor really “democratic”. Argumentation was almost never for the purpose of winning, but to expose more points of view (this is a hard one to explain to people). Many many times after long “discussions” the group would then defer to the person they thought had the best view of things for the final decision (so “voting” was rare, but things worked).

Bob Taylor was once asked just what he did at Parc (he had been a research psychologist, not a technical computer person). He said “I wanted to get great (and self driven) people, and they tend to be lone wolves and not really manageable (and I didn’t want to try to manage them). I wanted to set up an environment that — when these people needed to cooperate to make much better progress — they would do this”.

This is what happened — it was quite wonderful — people would really be thinking their own thoughts, but would wind up cooperating when that was a good idea. A good example, is that pretty much everyone at Parc could design and make a programming language/OS, etc. (and did). So the agreements there were minimal. However, we also had decided to try to make “100” of anything we invented — and this meant 100 Alto personal computers, the Ethernet had to allow at least 100 taps, etc. This required more cohesion — and this happened — and still without any top down pressure or planning. (It’s also worth examining the ARPA process over some years that created the standards and methods for the Internet.)

In my view (and Butler’s view, etc.) Taylor was a kind of social genius with regard to how to deal with talent, and get it to stay original, yet also get it to synergize.

---

VPRI was started “AD” (after Disney), and originally was a vehicle for staying with some of my favorite colleagues (this was right after the “bust” of the late 90s, and right before 9/11. The greatest early aid was my friend Bran Ferren (a very special person in so many important ways) who had pretty much been CTO of Disney for a few years. He manage to talk Disney out of a few hanger buildings on Disney’s 100 acre “Imagineering” lot for a new company he and Danny Hillis put together called “Applied Minds”. They liked having us “hang around” so Bran offered us free space if I was willing to site my NPO VPRI there. (This was a great thing and a lot of fun!)

But we had no funding, so we scrabbled around in various ways. “Research means you can change your mind” so I have always avoided working on problems/goals posed by funders, and stuck to this regardless. Kim Rose was completely essential to every part of our process — the highest priority part was that we decided to “never miss payroll” — and we didn’t (sometimes via dipping into my savings, etc.)

The other principle — that I’d gotten from Dave Evans in grad school — was that you should never lie to a funder. His underlying principle was that you can’t do what he called “good science” if you get into the habit of lying. So — he said — you have to start “learning how to tell the truth in very special ways”.

VPRI was on a very small scale for much of the time, but top talent was still involved, so we did what we could do — some of this was to do some major investigations of Etoys in schools, etc.

Some of the additions were really random circumstances. For example, I had spent quite a bit of time off and on in Japan, and even was asked by the Japanese government to help them spend some of their funding they wanted to use to widen their range of talent pool. This turned up a number of “real deals” who had very little connection with the normal Japanese channels.

One was an artist of considerable talent — Takashi Yamamiya — who also had considerable talent with computing, and he was unencumbered with any kind of fear of being weird. He did some really wonderful things with us for a few years, and wound up going back to Japan, got married, and returned to some of the art he had been doing (we miss him very much — he was a great person in all respects).

I will another story — later and elsewhere — about my long time relationship with Yoshiki Ohshima — a real friend and a great colleague. His advisor in Japan brought him to us while we were at Disney with the warning: “Be careful, this young student has strong opinions and expresses them often”. My reply was “Great, I’ll hire him right now!”. Yoshiki is one of the best people I’ve worked with over my career …

I would occasional teach an “Alan Kay mystery hour” class at UCLA, and did a joint one about education, computers, etc with a joint class in the US and one in Japan, using special equipment. We had a variety of interesting kinds of students, but there was one who was really interesting — really really interesting — who was just sitting in. He had a few insights about the computer tool we were using that were startling, and I invited him to be an intern at VPRI, wound up being his thesis advisor, and colleague, etc. This was/is Alex Warth.

I should mention Bret Victor. I had heard of him and seen some of his stuff — we got together in a park in Palo Alto. About 15 seconds after talking to him, my thought was “Wow, this guy should have his own research group — VPRI is too small a venue for him” — so I hired him as a consultant, until we wound up finding enough funding for YC and Vishal Sikka to set up HARC — and gave him the larger place to play that he needed.

I met Vi Hart — via a lunch we set up — because I wanted to ask her questions about her process of making completely brilliant video explanations of things, especially in math. She was being funded by Salman Khan — which to me was a wonderful thing because I thought Kahn’s videos were almost “anti-real-math”.

It turned out that they had hired her, but weren’t supporting her. My response was “Oh, I’ll hire you right now! Come to HARC and follow your instincts!”. And she did.

HARC did not last long, but we were also able to get Dan Ingalls to move there from Sun, etc.

And so on and on …

---

A basic idea here is that — although there is nothing really wrong or sinful to improve on things that are around (sometimes if only for safety, this is what you really want to do) — it is also the case that we think our largest thoughts by inventing new contexts that are often very different from those around previously.

This is difficult to do (because you are trying to do this with what you learned in contexts that need to be escaped. By societal definition it is “delusional” and even often considered “crazy”.

And it is very difficult to tell a “new great-contexter” from an actual “crazy person” (or even just a blow hard). Dave Evans liked “other side of lines” types, and he would give quite a few a chance, who eventually didn’t work out. He didn’t care when they didn’t (and ARPA only gave Dave funding, they didn’t control his process), so Dave happily kept on giving people chances — and he wound up with enough “geese who laid golden eggs — rather than lead ones or worse” to staff up a first class ARPA project. (This was very similar to ARPA’s philosophy on the big difference between “error” and “overhead” that I’ve written about elsewhere on Quora.)

Can I become competent programmer if I will start learning Computer Science without attending university lessons, only relying on textbooks, internet resources?

I feel that this is an honestly asked question — and it is a difficult one, in part because it isn’t complete (it lacks many side conditions, such as how you are going to get experience writing programs, and especially how you are going to get feedback to help you improve both your programming and design abilities).

It could have easily been asked the same way with the substitution of “musician” (or “instrumentalist”) for “programmer”.

I certainly don’t think you need to go to university to get fluent at either programming or music — but I also think that both really require good advice and feedback from fluent enough entities (mostly human, though it’s possible to imagine good enough computer helpers for quite a bit of both).

Since most of both will wind up between your ears, the largest part of the process is lots of hard work with good examples over a fair amount of time. With good feedback, you can accomplish a lot.

I don’t know of good computer “advisors” for either (but I have seen low level computer aids for both, that do provide some elementary feedback).

An important caution is that any kind of curriculum — even bad curriculum — is difficult to make, and really good curriculum in most subjects is hard to find, whether from humans, textbooks, or the Internet.

Here’s an example in music. “They” say:

“There are four beats to a measure, and a quarter note gets one beat, a half note gets two beats, and a whole note gets four beats”.

Musically this is quite untrue — no musician will play four quarter notes evenly to perfectly divide the measure into four equal time durations. Musicians use another parallel idea called “pulse” (in jazz, called “groove”) that is a human approximation to this, but doesn’t feel metronomic. Singers are rarely metronomic, and singing is a great way to start to become a fluent musician.

(This is much trickier to talk about today than 50 years ago because of the many anti-musical elements that have been forced on the public (like drum-machines, auto-pitch correction, etc.)

This area is much like speaking with literal emphasis on strong and weak syllables. Consider from Shelley:*

If Winter comes can Spring be far be hind?

Here we have 5 pairs of weaker and stronger emphases. This is a kind of poetic meter, but we wouldn’t say it or sing it metronomically. It’s worth thinking about how we would say or sing it to convey the most meaning, while still making use of the nice lilting rhythm as part of the “groove” rather than the control of the “groove”.

The best way to learn this (and music) is to actually listen to others and to do the acts yourself. And try to get someone to help you tune into the meanings and musicalities that are only vaguely implied in the written down forms. (Note how little information is in the script of a play — there is usually less in most written down music.)

In learning to program, you are also using written down forms — and things will happen because of what you write — but the larger object is not a page of code but a running system, and the imagination required here can be greatly aided by advice and a friendly helper. Learning to think in systems terms from the beginning will make an enormous difference in how far and quickly you progress in the art of design.

A central idea here is that much learning is really autodidactic, but even strong autodidacts can greatly benefit by asking questions and getting feedback.

---

Gratefully borrowed from Stephen Fry

I used chatGPT to summarise some of your opinions. What does the real living human think of artificial intelligence condensation of these previously expressed opinions?

Here are two chatGPT summaries, as furnished by Mark Dickson:

Criticism on computer science: Write a criticism of the lack of progress in computer science and software engineering as Alan Curtis Kay would write it.

Alan Curtis Kay, a computer scientist and pioneer in the field of object-oriented programming, has been vocal about his critiques of the lack of progress in computer science and software engineering.

One of his main criticisms is that the field has become overly focused on short-term gains and solving superficial problems, rather than addressing deeper, more fundamental issues. Kay argues that the field has become too focused on the "low-level details" of technology, such as the speed and efficiency of algorithms, rather than considering the larger, more important questions of how technology can be used to improve people's lives. He has also criticized the industry's tendency to prioritize features over usability and design, resulting in complex and difficult-to-use software. In addition, Kay has criticized the lack of emphasis on interdisciplinary collaboration in computer science and software engineering, arguing that the field would benefit greatly from more collaboration with other disciplines such as psychology, sociology, and anthropology. He has also pointed out that the field has not done enough to promote diversity and inclusion, which he believes is essential for creating truly innovative solutions.

Overall, Kay's critiques highlight the need for the computer science and software engineering field to focus on long-term progress and solving real-world problems, rather than being solely driven by the latest trends and technologies. He advocates for a more holistic approach that takes into account the larger societal implications of technology, and emphasizes the importance of interdisciplinary collaboration and diversity in fostering innovation.

2. Write a critique of object oriented programming as Alan Kay would write it.

Alan Curtis Kay, who is considered one of the pioneers of object-oriented programming, has also been critical of certain aspects of the paradigm.

He has argued that the way object-oriented programming is typically taught and practiced does not fully capture the original ideas and intentions behind the concept. Kay has criticized the way that object-oriented programming is often reduced to a set of rules and conventions, such as inheritance and polymorphism, rather than being understood as a way of thinking about problems and solutions.

He has also criticized the way that object-oriented programming is often used to create overly complex and inflexible systems, rather than being used to create simple, elegant, and adaptable systems.

Kay has also pointed out that the focus on objects and classes in object-oriented programming can obscure the underlying principles of computation, and that the emphasis on encapsulation can lead to an over-abstraction of the system, making it harder to understand, maintain and extend.

He also has pointed out that the use of inheritance can lead to a proliferation of classes and subclasses, making the system hard to understand and maintain, and that the use of polymorphism can lead to a proliferation of methods and interfaces, making the system hard to understand and extend.

Kay has also said that object-oriented programming should be seen as a way of organizing and structuring code, rather than as a one-size-fits-all solution to programming problems. He also has said that object-oriented programming should be used in conjunction with other programming paradigms, such as functional programming, to create a more flexible and adaptable system.

Overall, Kay's critiques highlight the need to understand the original ideas and intentions behind object-oriented programming and to use it in a way that is consistent with those ideas and intentions. He also points out that object-oriented programming should be used in conjunction with other paradigms, and that the focus should be on creating simple, elegant, and adaptable systems.

ChatGPT Jan 9 Version.

The first one is more interesting than the second (which includes outright errors (such as what I think about encapsulation), and many other things). Both of the examples (I think) are much more vague than my view of my own writing style, and they have other deficiencies. But, if taken as summaries by another entity (human or otherwise), the vagueness is reasonable to expect.

A key omission (how did the “AI” miss it?) is the lack of central concern I have about systems and especially about “what winds up running”, and the “lack of ‘systems consciousness’ in the field today”.

Another is that I’ve repeatedly stated that I don’t think OOP is really a “programming paradigm”, but is rather a universal definition method (a tortured comparison would be to something universal like NAND, and a particular approach to making a computer).

The danger here is that “form very often rules over content” in most of human life, and the current “AI”s have been aimed at mimicking form rather than at understanding content. Given the general state of the public outlook, this is a disaster already well along.

What was the source of documentation and programming knowledge before the internet? How did daily coding look like without having access to Google?

The major HW manufacturers had “SHARE” organizations of their customers to share lore and sometimes code.

In the research game, there were papers from the several main conferences and the ACM, etc., that imparted techniques (these were invaluable).

For down and dirty, one would read code (and most of the code in the 50s and 60s was still machine code in one assembler or another. Sometimes these were commented (and often the comments were not updated when bugs got fixed, etc.).

Typical code looked like this:

image

(This is some of Margaret Hamilton’s code for the Apollo moon mission Guidance Computer.)

This had some extreme side conditions, including a tiny slow “odd” computer, and that most of the code had to be put into a very difficult to make ROM (of “core rope” memory). But listings like this were typically what one wanted to look at, especially the semi-readable assembly code columns in the middle of each page.

Are the slides (Squeak image) available for the Alan Kay talk linked in comment?

They aren’t right now, but — without promising anything — something like them might be made available in 2023 or 2024. Here’s the talk in question: (starts at 2:15 if the link doesn’t do it).

https://www.youtube.com/watch?v=AnrlSqtpOkw&t=135s

Smalltalk had unlimited “projects”, which can be thought of as “multiple desktops persistent over time”, where, in each, all the resources of Smalltalk were available live. This is a workflow idea that would be very nice to have in modern systems.

This (also) means that each project can be used to show an idea — and that (anything) can be interacted with during a talk. Going from one project to another is a bit like going from one slide to another in a presentation system, the difference being that you don’t have to “can” anything and are not limited to e.g. “build” features.

I have used versions of this for many of my talks over the years (presentation systems are terribly confining, and “anti-computer”).

This one was quite a bit of fun because it uses one of the first Smalltalks (1978, and the one Steve Jobs saw) to have “projects”. I was able to do everything without having to add any new primitives to that Smalltalk (the talk was aided because the old Smalltalk was now running on a machine with lots of memory — so it was possible to include many more bitmap pictures than we had room for back then).

However, these freedoms also encourage “Easter eggs” etc., and means that many things — often that seem magical — can be done in the order the speaker chooses on the fly.

Today’s presentation systems, on the other hand, have a useful feature that things can be advanced by a single control — such as <space bar> or clicking. This is especially useful when someone other than the author wants to run through the presentation (as in the question here at hand).

This would require a few more features in the live Smalltalk. For example, to also “can” interactions with the system, and this would generally require not just capturing (easy enough), but to also allow editing of these (a larger major chore).

For this historical system, this would amount to adding important features that we didn’t have or use back then (things would start to get confusing as to just what things were like back then). We would not have to change the kernel, but these changes would be out of the scope of back then.

On the other hand, this year 2022 was the 50th anniversary of Smalltalk, and next year is the 50th anniversary of the Alto. Besides museum exhibits and celebrations, the interactivity of the Alto invites an online site so that more people can experience what things were like when the Alto started working in 1973.

The (very complete) precursor of Microsoft Word (called “Bravo”) is very usable today — and it would require quite a bit of scaffolding to make enough of it easily learnable for visitors to an Alto website. The old Smalltalks — because they are live programming and development systems — would require lots more scaffolding. And a use of an old Smalltalk — as in the Ted Nelson tribute I did above — would require even more to allow visitors to smoothly run through it.

However, it would be great to have all the additional scaffolding done for a historical live interactive website — so we are looking at just what would be good (and possible) to do.

We are all indebted to Dan Ingalls for his many versions of the Smalltalks, and now, even more, because he has made a “Smalltalk Zoo” of live interactive versions of many of the old systems. I think the latest version of this is available through the Computer History Museum’s web site — Phillip Remaker found this link to the Zoo:

https://smalltalkzoo.thechm.org/

What is the history of the first personal computer?

I think we first have to come up with an agreed on definition for “personal computer”. What most people think of is something which cost originally in the range of a television set — but I think we could extend that to include other “personally owned things” such as automobiles and houses (but not Lamborginis or office buildings, etc.).

Using this expanded range, we could look at machines like the G-15 and the LINC, etc.

image

Harry Huskey’s Bendix G-15 ca 1956

image

Mary Allen Wilkes’ LINC computer in her home (invented by Wes Clark) ca ‘62.

Please write comments to help the definition, and I’ll try to answer.

What does Alan Kay think of Ruby programming language, since it was inspired by Smalltalk?

As I said to a similar question about Swift: I don’t think my opinion matters.

However, I think the Ruby people did a number of nice things with their design. I think that more kinds of safe reflection and workable MOP, etc., would enhance the good that is already there.

Mainly though, I think that all the programming languages I’m aware of today — including Smalltalk and Ruby, etc. — have failed to be redesigned and reinvented to reflect how Moore’s Law has affected computing resources.

I think that the support for “Moore’s Law” software pretty much stopped in the early 80s, and the new generation of computerists — many now in corporations — started to optimize and patch the older ever more obsolete forms. Once you’ve put on these hats, you can’t even find your “design hat”.

———— Added Dec 3 2022 ————

This is in response to the 5 current comments, which are all about parallelism. This is about optimizing, and what is needed is something qualitatively different.

Namely, that to actually do a real Moore’s Law for SW it is necessary to exponentially change/raise the level of meaning and intent that can be given by the designer/programmers. In other words to start building programming systems that know about many things beyond how to give variables values and create simple blind relationships. Such a system would automatically avoid most of the recent large company systems crashes. This is analogous to how an embedded good garbage collector simply does not allow any kind of memory crashes.

John McCarthy wrote a paper in 1958 “Programs with commonsense” and spelled out a lot. He in vented Lisp in order to work on this kind of programming.

Sketchpad was a different approach to the same understanding of the need for a qualitative jump. Basically, in Sketchpad, the programming was done in terms of requirements for what you wanted, and Sketchpad had three problem solvers that could find solutions to the multiple requirements given to it. Its lower level was the first OO framework, but the programming was cosmically aimed at the future *where the main programming is all done in terms of meaning*** in ways that are quite separated from any kind of optimizations.

How an object is created? What are the important parts necessary to create an object?

This is an interesting question — I’ll take it to include “… on a vanilla computer with no “object language” already present”.

My view back in the 60s: since any computer with enough memory can simulate any other (including future computers), then a simulated (“virtual”) computer can also define every part of any computation — and this includes not only processes, but what was then called “data”. The simulated computers would be running code of some kind, so multiple processes would have to be protected from smashing each other, etc. So this would be like the dividing of a time-shared computer into separate protected processes that can communicate with each other. The communications would be much safer if they were “requests” rather than “commands”. And so forth.

It should go without saying that any useful computation process should also be able to be instantiated in various ways.

Basically, a software version of computers on a network or bus (and very much influenced by the ARPAnet that was then being worked on). A good larger vision of this is that “everything should be a server”.

This is really easy to do, except for: safety, and efficiency of time and space. So — in practice — a lot of software engineering is required to make the schemes practical and useful. Once done, there is an opportunity to make highly scalable systems from the objects — this is a design task that is quite separate from making objects (terrible systems have been done using object components, etc. — and a few nice ones as well).

You can figure out what to do next to allow “messages”/”requests” to work — note that the form of a procedure call will work — it includes a “target” and parameters for the request, etc., but that the semantics has to be expanded in a number of ways.

For example, the target has to be able to ignore the request (it may be from a dangerous sender), the target may need to acknowledge the request, but delay the actual answer or service, the target may need to gather knowledge over time to help it serve, and so forth, the target may needed to remember and undo its services, … A really simple one is that standard subroutines are set to establish a “history chain” of calls, that then needs to be unwound in reverse order — this has to be modified at several levels — to allow other processes to run in various semi-concurrent ways — to not require a history chain (maybe something more comprehensive. etc.) The instantiation of procedures has to go beyond re-entrancy and recursive use, etc.

The basic idea here is to think long and hard about systems and what is needed to make them, and then to make a basic “node” that can do any and all that is needed.

A more sophisticated scheme could be to send requests to a “broker” process that can find the best servers (take a look at what LINDA does, etc.). This could be “call by need” — in other words, you don’t know the name of the service, but you can describe what you need, and the brokering processes will do the finding for you …

Bottom lines here are that (a) “something vaguely like an object” can easily be made from any vanilla computer resources, (b) most such things in most current “OOP” languages are neither very good renderings, nor is their use enlightened, (c) in the current day of many doublings via Moore’s Law since 1966 (about 35 doublings perhaps), a version of OOP is needed to deal with current scales and needs (d) having great objects, does not say all that is needed about how to organize them — this is one of the greatest ongoing issues (I.e. “OOP” is not really a programming paradigm, but a universal process definition scheme).

What does Alan Kay think of Neil Postman's invited talk at Apple in 1993, and are his main arguments even more pertinent & relevant now 30 years later?

I think the referred to talk was:

https://youtu.be/QqxgCoHv_aE

(there is some noise on the left audio channel — balance to the right to help.)

Neil and I were good friends. I set up this talk and invited him to have a go at a room full of rather set opinions.

He once told me a story that gives a lot of insight into him and his approach. When Neil was in grad school in the late 50s, he and his fellow student Charlie Weingartner got interested in Marshall McLuhan and started following him around to his talks. Neil said they soon noticed that McLuhan did not respond to questions or arguments from the audience. Instead, he would say “how about this one?”, and come up with another of his “from right field” weird cryptic aphorisms. Neil and Charlie eventually realized that McLuhan didn’t care whether anyone agreed with him — what he cared about was whether the audience was actually thinking at all! He viewed his purpose as a wake-up call.

It’s also important to understand that Neil Postman not only could really write, but that he was extremely conscious of his writing process, control of rhetoric, and the vast differences between oral and literate societies, world views, and expressiveness. The result was both some of the most readable serious prose during his time, but also his oral talks were actually written beforehand, and so well that he could read them and still be “oral” for his audience, but “literate” as to depth.

In my opinion, this made him the best, most accessible, and most important public gadfly of his time. He could write “Education as a subversive activity” at the beginning of his career, and a bookend at the end: “Education as a conserving activity”, and make both views be even more valuable together and supporting than their initial seeming opposition.

He could watch television, and condemn it without being a hypocrite. He could inveigh against “technology” in deep ways, while still using it, or becoming a Luddite. Neither “conservative” nor “liberal” terms fit Neil well.

He was essentially an Artist reporting what he saw and thought about during his life. I found him to be one of the most endearing, genial, and civilized people I’d ever met.

He knew — as did McLuhan — that, down deep, public utterances are at best “commercials” and “carrots” for the much harder and longer pondering that most of us need to —have to — do to start grappling with the real issues. I miss him very much.

So, with the above as context, what do viewers think Neil is trying to get the audience to think about?

Do you think that Alan Kay was more focused on children than education? Why?

H.G. Wells said it better than I can: “Civilization is in a race between education and catastrophe”. His notion that education could be a deterrent to catastrophe was part of the age old “nature vs. nurture” questions that gained ground as Anthropology and Psychology delved deeper into trying to understand humans, their cultures, and behaviors.

Children who are deprived of a human cultural surround when growing up — such as “Genie”, who was raised in a closet with no contact, or the various “wolf children”, who were raised with animals — do not develop normally. Much of what we consider to “be human”, such as language, etc., are not embedded by genetics in a child’s brain at birth — instead, genetics embeds “desires, interests, propensities, etc.” that draw the child’s attention to specific examples in the environment to be learned as behaviors. For our species, this process is so strong that it forms the “prime idea of anthropology” — a child at birth can be taken anywhere on earth to another culture, and will grow up as a member of that culture, thinking the thoughts of that culture, being that culture, etc.

A key idea here is that we humans treat what is more or less constant around us as “normal” — and, critically, we shortcut “normal” in our minds to “reality”. Because our nervous systems are tuned to differences, we pretty much disappear “normal” as an idea that is relative to surrounds, especially cultural surrounds, and simply take most of our circumstances as “real” rather than “mental constructions”. What we believe mostly determines how we see and interpret the world.

The deep “nature vs. nurture” question is: to what extent can our impulses from genetics be constrained and augmented by learning — and especially: early learning? For example, could the US Navy SEALS slogan actually be true? I.e. “Under pressure, you don't rise to the occasion, you sink to the level of your training”.

There are many examples where training/education does augment — even replace — some of the behaviors arising from our genetics. But could this be enough to make important differences in larger human societies?

I don’t know the ultimate answers here, but have long been interested in the wide spreads of behaviors that are associated with “humans in the raw” and “humans in society”, and especially the extremes, which often cut both ways. An often examined example is the transformation in the 20s and 30s — in less than a generation — of Germany from one kind of stability to a very different dangerous society — and then, after WWII, back to a very different kind of system from either.

Readers who like to think in terms of systems will reflect that examples like these don’t necessarily mean that every individual in e.g. Germany, was swung from one extreme to the other and back again — but that the societal system — the overriding culture — manifestly did (and rather quickly).

Another systems perspective of this has to do with thresholds for indiviual elements of systems that are needed to have the larger organization be stable enough to “work”.

A famous reply by Jefferson in response to whether democracy could ever be made to “work” was: “I know of no safe depository of the ultimate powers of the society but the people themselves; and if we think them not enlightened enough to exercise their control with a wholesome discretion, the remedy is not to take it from them but to inform their discretion by education”.

I think it’s fair to say that this central idea — that education in a democracy, above all has to raise citizens above the threshold needed to participate — has been absent in US education for so long that the education establishment doesn’t realize it ever existed as an idea and goal.

This doesn’t mean teaching a party line, but it does mean teaching e.g. how to argue with others in such a way to illuminate difficulties to aid solutions (as opposed to “trying to win an argument”).

This idea about arguing to make progress was deeply embedded in the ARPA research community, and was a vital part of its success: it allowed the widest view of ideas and issues to be brought forth and cooperation — and damped out fruitless contention and competition of egos. (The ARPA process apparently started in the radar project at MIT during WWII.)

An interesting question beyond whatever the needed educational/training processes might be, is “when in life can they best be taught and retained?”.

Most indications point to childhood, especially the first 7–10 years — this is when many mental commitments to “world view” (aka “reality”) are most strongly made.

My own interests in all this went from “vague understandings” to “vivid thoughts” after meeting Seymour Papert in 1968, and seeing what he was trying to do with children and thinking, via giving them a rich environment to play with ideas/processes through the interactive LOGO computer system.

A vital point about what Seymour was doing is that — even though he was an excellent mathematician and came up with many mathematical examples for LOGO — his main aim was “world view” (a fancier larger term for this is “epistemological surround”).

He wanted to find ways to qualitatively enrich what children grew up taking as “reality”. When he said that “the best way to learn French is to grow up in France”, he didn’t just mean the language, but also the culture (metaphorically meaning: modern ways to view and think about our situation).

This strongly echoed Maria Montessori, whose main underlying aim was the same — she went about it by pointing out that “you can’t learn the 20th century in a classroom”, but must somehow be able to grow up in it. She decided to embody this in the fabric of her school to make it be “the 20th century”.

There are literally many more critical ideas that are part of this: too many for a Quora sized answer.

A big one — in size and amount of understanding required — is how we are affected by “media” — intermediary things, tools, processes, ideas — that are part of our environmental surround. McLuhan quipped “We become what we behold!” via intermediaries — in other words, there is nothing neutral about tools/communication systems —what counts is what changes happen in our brains to learn anything, especially tools and media — but he said almost nothing about our difficulties in really beholding.

For example, what we most learn from a tool — such as a hammer — is that we “can hammer” and “how to hammer” — the pragmatics of “hammer” — but we have great difficulties in coming up with larger ideas about “hammering” (i.e. don’t hammer others when you get upset, etc.). Doug Engelbart realized this in his first papers about “augmenting humans”: that providing humans with ever more powerful tools is a disaster — between the human and the tool has to be education/training, not just in use, but in “use for what?”.

This means that when a new powerful tool/medium comes along, the burden is not just how to teach the use of it, but the “use for what?” and the “what is it really?” of it. In other words, we need to learn how to insert barriers that prevent simple-minded pragmatism between us and our ever more powerful intermediaries.

Vi Hart: “We must ensure that human wisdom exceeds human power”.

Much of modern science is about how and why to go past “the world is as it seems” to understanding that the world is “mostly not” as it seems, — much of what’s important is “behind the curtain” — and thus everything needs to proceed much more slowly and deliberately in order to get more valid views to “make the invisible more visible”.

Thus the main “world view” of science strongly obtains in most other walks of life as well, especially in matters of society, its “discretion”, and how to raise it. The perspectives of the new branch that is systems science, can be particularly helpful.

We should be able to see that simply “teaching and using tools” is not just futile, but very dangerous with powerful tools (this includes not just recent tools — such as networked personal computing — but also older tools — such as reading and writing).

So, in the race between education and catastrophe, we must learn to deeply understand catastrophe, education, intermediaries, society, humanity, and children, and then take major actions for all.

What did Kay mean by “I wanted to get rid of data. The B5000 almost did this via its almost unbelievable HW architecture.”?

One way to think about classical “data” is that it is something passive that can be munged by an assignment operation. Procedural languages generally have variables and data structures that can be munged — so-called “OOP” languages generally permit “setters”, which also allow munging.

The Burroughs B5000 computer with many very different features was revealed in 1961. It was designed to efficiently run Algol 58, and much of Algol 60, more or less directly. Much has been made about its hardware stack, but this was just the tip of the iceberg. The most interesting features of the machine had to do with safety of both state and processes, and especially how memory was structured.

A word (48 bits) in memory was marked as either a number or a “descriptor” — a protected pointer to hardware or software entities. In “regular mode” these pointers could not be forged. One use was to implement virtual memory, via a “presence” bit in a descriptor. If on, the core address of the segment was in the rest of the description, if off, the rest of the descriptor had the disk location of the segment, and automatically brought it back in.

An important idea is that the “byte-codes” of the B5000 did not contain storage references and could not directly see memory. So the OS (also written in an extended Algol) would grant an environment for code to run that the code could not hurt. All this was the basis for “capabilities”, and made the B5000 almost uncrashable in practice.

I should also mention that the machine had at least two CPUs and an automatic process switching mechanism.

A very interesting feature was how the B5000 handled code such as

A <- B

The byte codes looked like

<name call: A> <value call: B> <operation: <- >

The first syllable wants an address for A on the stack. Next a value of some kind is put on the stack (B stands in for any expression). Finally the <- is done to the top two items in the stack.

The interesting wrinkle is that if A contains a procedure descriptor, that procedure will be called with a flag indicating that a name should be the result. I.e. that the A is on the left side of the assignment statement. (This later led to the idea of “L-values” and “R-values” in languages like BCPL).

This meant that the B5000 could hold off munging and protect structures at will. This was pretty neat!

I thought about this a lot as a grad student. One example, that couldn’t be done “right” with this scheme was to model sparse arrays because the name call was early and separated. This eventually got me to abandon the idea of “<-” as an operation, and to think of it as a kind of parameter:

A <- B should be something like A(“<-”, B) … which (wonderfully could actually be done in Algol).

In other words, the idea is to have encapsulated entities which can only receive messages as requests (not commands), and they get to control what state and state change means. I thought of this as abandoning imperative programming for “request programming”. So the entities would be much more like “servers” than passive vulnerable bits.

Why is school & college the way it is, limiting most children's potential?

This is why good questions are very often better than answers—especially when it is about systemic problems.

Sometimes, when in this position, just working on “how things should be” will get further than trying to figure exactly how things got to be so bad.

So, let’s turn the question around to ask: “How can we best help children to maximize their potentials, while using the best conceptions of education we can come up with?”

Again, this doesn’t have complete crisp answers, but it does give us some starting points.

For example, a really good starting place would be to have society really understand what Maria Montessori wanted to do and did to help children — in part because she took on our second version of the question as her life’s work, and — as an authentic genius — came up with many great insights and starting places.

I think reconciling an enlightened understanding of the best world for a child with best conceptions of what adulthood should be like is a very good way to start working on the future.

It is hard to see how a start can be made without a combination of idealism and realism that is the heart of modern thinking.

What is the major obstacle in teaching computing in general and coding in particular in primary and secondary schools?

This is a good question that can lead to even better questions.

First, I think the problems with most of education — including about computing — are systemic, with many parts failed or missing. In addition, the current failed system is large and active about protecting what’s left.

Second, “coding” today is both a trick word that has been co-opted for a too tiny subset, and its larger meaning from the past misses what the larger meaning of “programming” used to signify.

Curriculum needs to not just be better, but qualitatively different. Like writing, it is a lot of work to even produce a flop, let alone something really good.

Of the many problems of curriculum, one is the strong tendency today is not include any important matters where it is felt that “the teachers are not up to it”. (I’ve been in many meetings where good ideas never made it for this reason.)

Most teachers are working like mad, and are in very difficult situations regarding students, cultures, management, politics, etc. before even grappling with 21st century high knowledge. And one of many causes of these problems is that today’s teachers are generally themselves the product of the failed system.

However, the state of the teachers are a critical part of any improvements that are attempted to be made.

Human beings are set up by nature to learn from their cultures: however, writing, math, science had to be invented, etc. In many senses, school exists to teach the hard to learn stuff that doesn’t arise naturally in cultures via genetics and traditions. One of Montessori’s great ideas was to make the whole school the embodiment of the way the outside culture should be by embedding it in the processes of the school (rather than trying to teach the ideals directly in a classroom). I think this is a very good idea!

However, no teachers were more trained or deeply vetted than Montessori’s. Right now our society does not actually value education, and is not up to supporting it by vastly improving teachers in every way — including paying the good ones today what they are really worth.

If we take “computing” seriously — and I do — then in its relative youth, it needs analogies to similar difficult fields to help see how it should best be taught. Two that can be helpful are “writing, reading, literature”, and real mathematics fo the general public.

The first of these is easiest to discuss. School “likes output from students”, and it has shown that it doesn’t care much about the quality or relevance of the output. Processes have tended to shrink to the easily testable output that the school systems feel comfortable with.

Frank Smith points out that ideas are at the roots of writing and reading, and these are not only the core of “literature”, but are larger in many ways than current literature itself. For all these with regard to children, we need to look at children’s books of ideas that are not only readable, but upliftable. If we look at ideas in the 19th, 20th, and 21st centuries, we need to recognize that many of the most interesting and central involve science, systems, mathematics, technology, and improving how to think. (It is hard to find children’s books with much real content about any of these — or to find teachers.) On the other hand, one of the most important traits of science, systems, math, etc., is that they are arts, and thus defunding the arts to teach e.g. science is robbing Peter to pay Paul, missing the point really badly. Without the ability to sense and appreciate and make beauty, much of modern striving is a dead end.

It’s worth thinking about writing for a moment. It is a big deal to learn how to write sentences, but an enormous error to confuse this with “writing”. What’s most difficult in writing is to create a larger system of connected ideas while retaining something graceful in the small.

I think we can apply this directly to thoughts about computing and its learning. Computing is, over all, really about systems, not computers. And learning about systems, and thinking in systems terms, is right up there as a new standard subject for all children on their way to adulthood. I think we can put “ideas in systems terms” into our version of Frank Smith’s core ideas. Now we have to think about — and in the world of how children think about things — what should “reading” and “writing” be?

Good answers will require not only a revolution in public schooling, but a revolution from what is now a kind of pop culture around computers, that much of the time can’t see the systems for the bits.

Lots of real deep work ahead.

What was the staff size at Parc?

Parc had 4 labs: computer science, systems, physics, optical. The ARPA culture that formed most of the computer part of Parc pervaded most of the first two, and none of the others. My group and I were in the Systems Lab and both Systems and Computer Science labs worked closely with each other (the reason for two labs instead of one was mostly a funding ploy by Bob Taylor).

A lot of what Parc is known for was done in the first few years by both labs. Butler Lampson — a principal figure at Parc — has estimated that the number of researchers was about 25. The balance of researchers to support staff at Parc was about 1:1.

An important historical fact was that Xerox did not allocate headcount on the basis of budget, but used “slots” (which were like gold). Whenever a slot became available we would try to find and hire a star. This led to a very unusual (and effective) makeup of lab denizens!

This is one point in time. A few years later I could count about 40 or so researchers.

The other two labs were similar in size and support. A simple calculation shows that Xerox paid for all of Parc many times over with just the profits from the laser printer alone …

A possibly interesting side note is that the salaries were not high at Parc, but that the average yearly hardware support for each researcher was very high: much more than their salaries. This is because the aim and reality of Parc was to spend enough money to “live in the future” by anticipating what would be possible with Moore’s Law in 15 years, and to duplicate those conditions for the researchers by inventing and building that future. This was relatively expensive per person, but the low head count made it affordable.

Could Xerox PARC have been done remotely with today's technology (not necessarily methods)? What does Alan Kay think about "remote work" for collaborative research?

This is an interesting — and tricky — question. For one thing, attempts to create “a Xerox Parc” in any form have not been successful. I think the previous ARPA community was critical to helping shape most of the researchers who formed Parc, and that many of the critical researchers at Parc were not replaceable.

For another, among many aims, Parc was part of a tradition of research to create “remote work for collaborative research” predating Parc (e.g. Engelbart) — and Parc had several active projects around this idea, as did Negroponte’s Arch-Mac group at MIT.

The “today’s technology (not necessarily methods)” part is also critical because serious technological resources are required to even have a chance. For example, while I was at HP and Disney I got to see and use HP’s “Halo” collaboration technology done originally for Dreamworks to do live media movie editing between the UK and LA, and then was able to late see Cisco’s experimental collaboration tool (which had been used between Hong Kong and LA to build itself). And later, to see Ishii’s “Liveboard” inventions, Applied Minds’ schemes, etc.

Another tricky part is to notice that what we were working on at Parc would have been easy or not needed at all if the technology to create it remotely had existed.

The best answer I can come up with here is that some of what made Parc work could have been done remotely — but I don’t think all. I think much of the actual fabrication of SW could have been done remotely — but the main work of the SW (the problem finding and design, etc) would have been difficult to impossible remotely.

Part of the problem of remote collaboration is to solve the problems of both “getting things done together” and “arguing without alienation” — especially in groups — mismatches. I think the HP and Cisco technologies did a very good job of transmitting enough “presence” to get a lot done, and also to allow for a fair depth of arguing.

There have also been enough experiments in eye contact and how to do it, to go further. There are deeper problems of trust and trust building that are really hard to do with remote technologies, and also problems of “intimidation” and social coercion that need to be looked at. (For example, take a look at “focus” at a live concert or theatre vs. canned versions of the same thing, and think about intermediaries to these experiences.)

Both “goofing around with each other” and the nature of the “goofing” were critical to Parc, and quite a few of the most important visceral processes are not readily subsumed by any kind of technology.

What are some notable reimplementations of SketchPad by Sutherland?

I don’t know of any reimplimentations — but I wish there were.

Alan Borning’s “ThingLab” thesis at Stanford/Parc improved on a few of the Sketchpad ideas, but did not cover the whole territory. He is also responsible for numerous constraint solver systems — including Cassowary, which is the heart of many of the mechanisms for doing automatic layout, especially of web pages, etc.

Sketchpad was one of the deep roots of CAD — I don’t know of any CAD systems with the “deep feel” of Sketchpad, but there is lots of “surface influence”.

The bridge demo in Sketchpad has been redone a few times — I have used a version by Yoshiki Ohshima and Vanessa Freudenberg in a number of talks now online (one was the “Turing Centenary” talk).

Brian Silverman has wanted to emulate the TX-2 to run the original Sketchpad assembly code, but for a variety of reasons, including the ever changing TX-2, this has not happened.

Ivan’s thesis is likely clear enough so that a reengineered version of it could probably accomplish a lot. This would be a good target for an MS degree I think.

How did Alan Kay decide to take inspiration from a biological system - (operations of a biological cell) to an (human) engineering system namely object oriented programming…was it just random inspiration or was he intentional about it?

The actual history of the particular set of inspirations is given in “The Early History Of Smalltalk” written for the ACM “History of Programming Languages II in the early 90s.

Many of these kinds of decisions depend on how loosely or strictly one sees analogies (I’m on the loose side, but short of being a mystic).

One of my major concentrations in undergrad was Molecular Biology (the other was pure Math). In 1965 Jim Watson published “The Molecular Biology of the Gene”, which among other things had one of the earliest assays of a living organism. And of course, I learned the biochemistry of the level below life.

I was a working programmer at the National Center for Atmospheric Research, but was not “thinking systems” at this point — I was basically a “data app” programmer.

The next year I went to grad school, and accidentally wound up an ARPA computer research department. ARPA was all about systems and the first one I saw shocked me into waking up and looking around. This was Ivan Sutherland’s Sketchpad. A week later I had to learn the up to then unknown language Simula. Then I learned about the plans for the ARPAnet. Etc,

As I recount in the history I was finally presented with enough examples to get me thinking about how to handle complexity in somewhat the way Biology has done it since the Cambian Era: by eventually being able to combine billions, even trillions, of components using universal building blocks (for example, our human bodies start with one cell and one set of DNA, and wind up with about 10 trillion cells of about 250 cell types (still with the same DNA). Each cell uses quite a bit of “overhead” to maintain itself, and contributes a little to the whole organism.

When the practical world of computing is factored into this (especially in the late 60s), what one gets is a vision that is a bit more comprehensive than either Sketchpad or Simula, but more or less along similar lines. This requires omitting many interesting other things that Biology has to offer, but you still get a lot.

Finally, I should mention that I was not trying to invent a programming language, but rather an extensible kernel — a definitional scheme — for specifying universal “cells” which could then be organized into a desired system. For example, Smalltalk is actually the idea of definition, it doesn’t tell you how to program. The “image” — an example system that is shipped with Smalltalk is but one of many ways to use the definitional idea.

(In practice, we should have explained this distinction better.)

What can you say that everything you see in nature is governed by the patterns based on mathematical concepts and theories? What is it about mathematics that might have changed your thoughts about it?

IMO, this question seems backwards — it is far more likely that nature is just the way it is, and we are the “pattern assigners” to the phenomena we can detect. I say “IMO” because the overall question is more philosophical than scientific (which IMO is a higher level of opinion than this).

We call the patterns we use to reason with mathematics. We like stories that seem to have some coherence, and the deductive/inductive framework helps makes better stories. Explanations of phenomena are never absent from the thousands of cultures that have been studied.

The big shift from “math thinking” to “science thinking” started to happen when phenomena and their explanations were looked at much more closely, and more evidence of special kinds in phenomena started to be demanded.

For example, there is no evidence that “nature is governed by the patterns based on mathematical concepts and theories”.

Two quotes frame up this point of view. The first, from Kenneth Clark: “People in the Middle Ages were as passionate about Truth as we are, but their sense of evidence was very different” — and from Albert Einstein: “As far as the laws of mathematics refer to reality, they are not certain, and as far as they are certain, they do not refer to reality”.

People who like pondering ideas like this should get and read the well known physicist Eugene Wigner’s essay “The unreasonable effectiveness of mathematics”.

How did Xerox PARC invent the mouse?

Xerox Parc didn’t invent the mouse.

The original mouse was invented by Doug Engelbart and Bill English in 1964 as part of investigations into how to use pointing and gestures as input to interactive computing system.

image

image

This uses an old idea — that dragging two disks at right angles will produce differential spins of each disk depending on the direction of drag. In this case, each disk is attached to a potentiometer that will change the voltage though it at the same rate. This can be interpreted as distance traveled in x and y.

This worked just well enough, and mice were relatively easy to make. Engelbart and English soon moved to a smaller 3 button mouse which was used for the NLS system of the “Mother Of All Demos” in 1968.

For the Alto at Parc in the early 70s, we initially wanted tablets, but good ones were still quite expensive to make. We looked at the NLS mouse for a variety of reasons (another story) and wanted one that was of higher resolution, and that was small enough to allow drawing with it. The first Parc mouse used digital shaft encoders attached to small ball bearings, and IIRC had a resolution of over 200 to the inch (250?).

image

I really loved that mouse. However, it would get dirty, so other attempts were made, including a very successful “ball mouse”, and an equally successful first optical mouse (Dick Lyon).

Later, at Atari, we did a “flea” which was a very small optical mouse (it worked really well).

What became of the work done at VPRI?

The website for papers still exists:

http://www.vpri.org/

You can also look at “Croquet.io” for ongoing work. Also take a look at “worrydream.com” for Bret Victor’s work.

What was the oldest computer or device you remember using? How does this compare to the machines you use today?

The first programs I wrote that ran on a computer were done ca 1962 in the Air Force for an IBM 1401 with 8k 6 bit characters.

However, while in grad school a little later I got to write a few programs for the Lincoln Labs TX-2, some of whose HW dated to ca 1956.

If Alan Kay was doing a PhD thesis today, would he write all software from scratch? Is that more than what is actually needed to get the point across?

Ivan Sutherland once pointed out that “A thesis is something 3 people will sign”, and Dave Evans saw getting a PhD as a “necessary evil”, a “union card” that you needed to get but shouldn’t spend much time on. These were two of the three signers of my thesis.

Both of these computing greats were talent oriented, and wanted to get as many talents of as many kinds into the research community as possible. Here “What Is Actually Needed” was “a significant advance in computing without using up much future research time”.

They also encouraged consulting “because that is the real world” etc.

One way they did this was to encourage and allow students — if so minded — to choose and work on their own projects, to see how they functioned as “real researchers who didn’t yet have a PhD”. These projects could be big ones or smaller ones. They wanted you to finish the latter, but didn’t require it on the big ones.

Dave Evans got me a job consulting on a small desktop computer project at an aerospace company almost immediately after starting grad school, and several years later when I asked Dave what I should do for a thesis, he said “Just write up what you are doing”. That turned out to be a very untidy mishmash of ideas, but many helped later on at Parc. “Three people”, including Dave and Ivan, signed this, and I didn’t look back.

I did write code as part of this, including microcode, but the best code was written by better coders later. The “thesis” was “It is now time and very possible to invent an interactive graphical personal computer”, and the mishmash was a lot of the things that needed to be done, and how they could be approached, and sometimes with real examples.

I think it was signed because — as rough as the mishmash was — I had made my point.

On the other hand earlier, Ivan had sole access to a supercomputer 3 hours a day and programmed and designed a tour de force — Sketchpad — in a year. He still used some tools done by others. He really made his point (I think much better than I made mine) — but both points were made.

---

A side comment: If we look at the question, and take “what is actually needed” (WIAN) as a threshold that implies a qualitative difference between below and above, then it seems clear to me that we only have to do “all software from scratch” if that is part of the point that needs to be made (and in a few cases it might be, or there might be no other route.

Another comment: It’s worth noting that the “before computer” standards and norms don’t really work well for many computer PhDs. One facet of the field can be looked at as “math”, and so at least part of this can be done solo. Another facet can be looked at as “design” and can sometimes be done solo (but good design often requires testing and iteration, which often requires building. And then we have that much of computing has to do with real world (and real time) issues, which do require not just building, but often a fair amount of engineering (in my experience, the best stuff has a combination of invention and engineering and iteration).

But grad schools generally don’t cater to the actual processes of computing. This puts the “single contributor” idea of a PhD in question.

Another problem is that — today especially — most grad students do their computing on vanilla laptops etc, which means they are computing in the past, and also competing with highly optimized SW done by teams. In my era, we used or built “time machines” — supercomputers etc — that allowed a single user to “compute and think in the future” (that would come via Moore’s Law). This was one of the keys for the best stuff in the 60s and 70s.

So: ways to do a real PhD as part of a large real invention team would help. Being able to use supercomputer cycles to avoid having to optimize would also greatly help. Getting universities to revamp their theories of the field and what a PhD should mean would really help. Etc.

Also: see the comment by Jennifer Ramirez below, and my reply.

How does Alan Kay think Ivan Illich performed as a problem-finder?

I think anyone interested in education past, present, and especially future should read Illich’s books — particularly “Deschooling Society” — for an early deep critique of standard schooling — and “Convivial Tools” for crucial perspectives on our tool-based species.

I would rate him quite high as a “problem finder”.

Illich’s own words about the problems of schooling:

The pupil is thereby "schooled" to confuse teaching with learning, grade advancement with education, a diploma with competence, and fluency with the ability to say something new. His imagination is "schooled" to accept service in place of value. Medical treatment is mistaken for health care, social work for the improvement of community life, police protection for safety, military poise for national security, the rat race for productive work. Health, learning, dignity, independence, and creative endeavor are defined as little more than the performance of the institutions which claim to serve these ends, and their improvement is made to depend on allocating more resources to the management of hospitals, schools, and other agencies in question.

What does Alan Kay think of James Kunstler’s “The Geography of Nowhere”?

Summary: worth reading overall, could be more aimed at why people/society are the way they are rather than just the nutty, even crazy things people have done and are doing without the notion of consequences.

This book is on Kindle (for $0) and I encourage taking a shot at it.

Why did so many of the innovations at Xerox PARC not make it into mainstream computing until several decades later? What lessons can future innovators learn from this?

It’s worth looking at the history of the Ethernet. It was of interest to the outside world because there wasn’t a workable solution to “local area nets” in the 70s, and the Ethernet worked nicely (if a bit oddly to conventional thinking). Still, it had a pretty straightforward path involving other companies (e.g. DEC) and making a new company (e.g. ThreeCom), and doing considerable engineering on connectors, speeding it up a little, etc.

The GUI was even simpler: to see it was to get the basic idea — a bitmap display and pointing device more or less required, along with enough memory and processing power to do the rendering. This required an executive with influence (Steve Jobs) to see the power of the idea (this happened 6 years after the invention of the Alto), but the Lisa had a good version of the GUI about 3–4 years after Apple decided to go for it.

It’s interesting to note that these two inventions did not have an alternative viable form, so there were niches they could fill. Note that both got easier and easier to do as Moore’s Law expanded.

As an extreme example in the other direction, we could look at some of the programming practices at Parc, quite a few of which have not been generally adopted even to this day. There is a lot of evidence that the main reason for the difference was that everyone in computing already thought they knew how to program, but it was clear they didn’t know how to do a LAN or GUI.

My bottom line is that I think there is a general aversion to really learning new things in the outside world, so: NIH! partly from misplaced pride, partly from faddism, and partly from laziness.

The ARPA/Parc community was generally better at being willing to learn and use most already existing good ideas, and especially to share new results. I think this might have been partly due to the long history of this community, which stretched back to the WWII years.

Why was Charles Babbage unsuccessful in creating a working mechanical computer, when the Jacquard loom had already been invented?

It’s interesting to compare the recreation of “the difference engine” with the Jacquard loom. The approach to engineering and practicality could hardly be more different. The difference engine was much simpler than the analytical engine, but could barely be cranked because of the amount of backlash in the scheme (I’ve cranked both recreations …).

image

Of special note is to look at many chains of base 10 gears in precise alignment of the engine vs. the “rattley ad hocness” of the wires and guides of the loom (below). Jacquard’s approach was to “let things flop around” — basically to be noisy when it didn’t count — and to get preciseness only when needed by getting the flopping parts — usually wires — to go through guides when precision was needed.

image

The bottom part is a pretty conventional loom. The left side is the card mechanism. The top is the card reader that — nice and simply — uses the holes in the cards to shift hooks for the many wires that reach down to pull up or not the warp threads. Take a look at

https://en.wikipedia.org/wiki/Jacquard_machine

and

https://www.youtube.com/watch?v=OlJns3fPItE

to get a sense of the truly remarkable set of inventions and mechanical approaches that allowed these to work — early on with just one operator power — at the rate of about 1 or 2 cycles per second.

Babbage could see that the card programmable loom could also be programmed for other things, but was unable — unwilling, probably — to work out what should be done downstream. It would probably not involve gears, would not be base 10 for most of the mechanism, and would still be very difficult to make work. For example, I think both the engine and the loom approach would need some form of restoring logic in order to work — this can be done mechanically — and not easily at all — and, for example, is much easier to do pneumatically.

The analytical engine was thought about ca. 1837, just about the time the “Barker Lever” was invented in England as a pneumatic force amplifier mainly used to “power assist” the key action in large pipe organs. This is done by using a weak pull to open a valve for much more powerful air that can then do the work (usually opening a much stiffer valve).

The greatest interest and early use was in France with the first large instrument completed in 1843. The “trackers” on these organs are rather like the “wires” of the Jacquard loom, and it is possible to see how a device that uses mechanical wires can be given intermediate power boosts via the Barker mechanism.

A pneumatic computer was certainly possible using the technology of the mid 1800s, but there is no record of such a device.

I think that Babbage lost out because of his drive for precision rather than logic, and though he was a terrific mathematician, he was not in the same class as the great engineers of his time.

It’s also worth pondering the likely negative influence of the difference engine over the analytical engine. A perspective to consider is that the difference engine “almost works”, in that it can successfully calculate polynomials by cascaded addition, using a scheme that is on the order of an old desktop calculator.

The problem is that the scheme doesn’t scale mechanically, yet it is how Babbage was thinking at the time. With his new thoughts about programmability, he also needed a very different physical paradigm to make a programmable engine. The Jacquard loom had more than the cards to offer: what was so interesting about it was how it handled much larger scales of information, noise, and mechanisms. The difference engine made it difficult for him to discard it in favor of what the new problem needed.

Hi Alan, I am a huge fan of yours. I'm just a regular computer/phone user who is trying to get a job in the industry, but I want to help advance computing in the ways you described. What can I do? I'm interested in utilizing today's powerful hardware

I suggest looking at the idea of “What Is Actually Needed” (WIAN), which I’ve written about in Quora, and occasionally described in talks. The center of the idea is that just trying to improve something often doesn’t take enough context into account, and may wind up making the larger situations worse — especially if the larger situation is already a bad idea.

One part of this process is to learn how to really criticize — especially that which has been around long enough so that it seems “normal” and “tested by time”.

Another part is to find the WIAN threshold, which is often qualitative. For example, WIAN might be something radically different than a “better old thing”. One route is to get sensitized to seeing “almost a new thing”.

Computing is full of “unfortunate old things” that computer people are happy with repainting (for example, the standard OSs and the WEB). It’s worth trying to understand what went wrong to allow “containers” in their current form to be considered as “reasonable”. Why are most people still programming using the mostly weak ideas of the 60s.

Etc.

In order to find which knowledge you need to learn and invent, you first have to imagine the qualitatively better contexts that require that knowledge.

What is the list of civilization's most "powerful ideas"?

I think the top of the list is

There are powerful ideas

From here on, they come in many shapes and sizes. For example, one of the biggest is

Human brains are not great for thinking, but can be taught how to think much better

This is part of the fact that raw cleverness (IQ) can’t get far without Knowledge, and Knowledge can’t get far without Perspective/Context. Another way to look at this is that our main improvements have been in method — for example, the powerful idea:

Science

Another powerful idea is

Look for powerful ideas: try to figure out what the phrase means …

Years ago Seymour Papert and I tried to make such a list, and had quite a bit of trouble organizing it. We concluded that while articulating them was good e.g.

When something is complicated, find how to divide it into parts and try to understand the parts: rinse and repeat

it was number 4 that was even more powerful …

and so forth … (you can see why these are difficult to organize!)

Where does Alan Kay draw the line between tinkering and inverse vandalism?

I don’t think it matters where I draw the line, but having a real sense of “inverse vandalism” (making things just because you can) can be very important in an age where the industrial revolution can not just make in quantity, but can oversupply.

One key idea is that we don’t have curbs genetically built in for many things (from e.g. sugar to fantasies/stories, social interests, etc) because these were all naturally limited. But they turn into “legal drugs” or encumbrances when overproduced and over distributed.

It’s striking, and sometimes fun, to go to a large flea market (like Portobello in London) and muse that everything there was not just the product of tinkering, but they were once mass produced.

I don’t think tinkering is the main problem (it is very useful in so many ways). For tinkering, a big problem is to confuse it with engineering (they should work together). The larger problem is to avoid overproducing underthought things that can “redefine ‘normal’ lower”.

What is considered an early version of a graphical user interface (GUI)?

There were a number of excellent GUIs before Parc (most of which provided ideas we used at Parc). My favorites were Ivan Sutherland’s Sketchpad system (ca 1962–3) and the RAND Corp’s GRAIL (GRAphical Input Language) system).

This research at RAND was part of the “DoD community”, and GRAIL was yet another ARPA project.

GRAIL was a followon to the wonderful JOSS system at RAND, and a leap from a typewriter based interface to one that was completely done by gesture. A fairly true story is that the wastepaper baskets (and also notebooks) were examined at night to determine some of the processes used by RAND analysts. They found a lot of flowcharting and diagrams, and decided to try to do a computer based flowchart editor.

To do GRAIL, a very good tablet had to be invented from scratch (Ellis: 1964 — the same year as the mouse), a terrific hand drawn symbol recognizer was done (Groner: 1966), and the larger whole system (ca 1968). To make things more fun and difficult — but realistic — they decided to program the entire system in itself.

image

There is a whole movie done by RAND, but it is worth looking at the excerpt I’ve used to show in talks, and then looking at the full movie. This is demoed by Tom Ellis, but I talk over him to point things out to the audience:

The 2 minute Excerpt

https://youtu.be/QQhVQ1UG6aM

Next, the whole movie, which shows many more details, including the astounding fact that the flowchart programming ultimately rested on a foundation of IBM machine code for the huge 360/44 mainframe used for this. The tablet itself was handmade at RAND, and you could get one for about $18,000 1960s dollars. The display system was over $180K in 60s dollars. About 1/3rd of the single user mainframe was used for the gesture recognizer.

The Movie

https://youtu.be/2Cq8S3jzJiQ

This system was many years ahead of its time — to the point that something as good hasn’t been done since! It was by far the most intimate feeling GUI I’ve ever used.

Some of the YouTube comments today failed to understand just what this research was after — which was primarily “human-comptuter-symbiosis” and how could the computer be “inserted” as an amplifier of thought without feeling like a drag or distraction. I.e. can everything that is great about pencil and paper doing and thinking be simulated successfully by the computer (and can some useful magic be added to this)?

So the flowchart part of this (and machine code part of this) was ancillary to the main goals. The larger intention was to have the symbolic language be more like JOSS (a higher level language), but then they thought it would be quite a demo if they did it in itself in IBM 360 machine code …

Part of the idea here was to see how far one could go in “connected smoothness” (and in the 60s). And some of this was very related to the work of Engelbart and his NLS group at SRI at the very same time.

One of the connections was the problem of how to get text characters to be part of the visual world in a reasonable way. If you are working and thinking graphically and using pointing devices etc then it is annoying and distracting to have to go to a text keyboard to type. Engelbart solved this with a “chord keyboard” for the left hand (for typing commands and text without interrupting navigation, etc.) and GRAIL did this with character recognition. Both of these were slower than a regular keyboard, but faster when the need was just for a few characters.

An important side note is that the Gabe Groner gesture recognizer for GRAIL actually pipelined the recognitions, so the human did not have to wait for the previous character to be recognized!

The Engelbart system also provided a typewriter keyboard, which you could go to for larger paragraph-sized volumes of text. So hands were “out” (for navigation etc) or hands were “in” (for inputing larger volumes of text. This worked well.

image

The Engelbart NLS Lap Control Setup

Some of us thought that this would work for GRAIL even better (with the mouse being replaced by the stylus). We also thought that two-handed UIs were generally better for any system one was going to spend hours each day on.

What does Alan Kay think independent researchers in computing should do for money if commercialization was part of the problem? What if "good funding" for research isn't available, but your ideas are worth pursuing?

I don’t know. There is no lack of money, especially when one looks at what the computer part of Xerox Parc cost, or even what the ARPA-IPTO community cost.

In my own experience, I and the other researchers within ARPA/ONR/Parc, etc knew we were in a wonderful situation. Our appreciation went up by a thousand fold after the great era ended in the early 80s, and commercialization of computing went into high gear.

One of the reasons the ARPA etc community was so productive was the level of funding it gave to universities and students, and the students grew to become PIs and mentors in the community. This started to fall apart after the ARPA funding ceased, and the pretty top down NSF became the main funder (and of much smaller projects). I’ve always thought that ARPA’s funding of large groups, often whole departments, in universities made a big difference (larger goals, less rivalries, no tenure pressure, etc.).

I was able to get smaller funding that would allow some things to be done with small groups, but I never was able to find “critical mass funding” that would allow a big area to be explored deeply and then made practical. This got more and more difficult and painful over the years, and I finally ran out of energy around age 78 or so.

A key point here — again — is “community”. I’ve always had roughly the same sized groups, but we were much more productive in the large at Parc because of the synergy with others.

As an old fogey now, it’s worth pondering the roles of business, government, and universities. In theory — and often in fact in the past — the universities were the best places to do breakthrough work — in part because of students, and in part because universities are supposed to be primarily about ideas. A good source of funding is the government, in part because it is supposed to be about the progress of the whole society, and in part because it can and does do various forms of redistribution of wealth to benefit the society as a whole. Businesses as they are currently constituted generally have much too small goals — they setup is not psychically good for making real progress.

If we look at these three, we can easily come up with great improvements not just on the current situation, but also in the past. For example, the “golden age funding” for computing came mostly from hot and cold wars, the government/citizenry has a hard time with “research for progress” per se. Universities became businesses from the baby boom, and essentially became more and more like trade schools and sellers of degrees. And businesses in the US form of capitalization and its markets have a hard time seeing that they must have higher aspirations than just making money.

Why wasn't the GUI by Xerox PARC or Xerox Star developed earlier? There was Sketchpad in 1963, which was a predecessor of a GUI, 20 years before.

The other answers are pretty good.

Let me first point out that the Alto at Parc with its bitmap display and overlapping window interface was done in 1973, which was only 10 years after Sketchpad, and for practical reasons required the first integrated circuit RAM (just 2 years after the Intel 1103 appeared in 1971) instead of the more expensive cores. Still, the Alto was expensive in 1973 dollars (and really expensive in today’s dollars) but over the years we made almost 2000 of them to serve as a “time machine” so we could work in the future rather than the present or the past.

Second, there were some truly excellent GUIs done in the 60s after Sketchpad and before bitmap displays, for example GRAIL at RAND Corp, which used a terrific tablet and gesture recognition to go along with great UI ideas. It had to use calligraphic CRT displays. Another system with many great ideas at the same time was NLS by the Engelbart group at SRI. Also a calligraphic display, but used video as a scanning conversion multiplying device, and had multiple panes, etc.

How could Xerox come up with advanced innovation tech (i.e. GUI) since their managements or leaders know nothing?

First, some of their management did know things. For example, the new Chief Scientist they got from Ford — Jack Goldman — was a first class physicist who also had a good feeling for what large companies needed. He urged Xerox to set up a Long Range Research Center away from corporate headquarters that would look at doing “critical inventions” in a wide range of sciences and technologies (including computing).

This wound up with another physicist — George Pake — being recruited from Wash U (StL) — which happened to be an ARPA project — to run Parc. The ARPA connection found Bob Taylor to set up the computing research, and he had funded both the ARPAnet and a wide range of top young PhDs, whom he recruited starting in late 1970.

By early 1971 he already had a critical mass of some of the best computerists in the world. He was able to get them because Congress inadvertently took away much of the “nice funding” for ARPA computing, and Taylor was able to convince that a company could possibly fund as well.

Part of the deal was that Xerox management was not to be in top down control of Parc — nor was Taylor — instead the top scientists “followed their instincts” and cooperated to do the “advanced inventions” (most were beyond “innovation”).

In the movie “Hidden Figures” is the depiction of the delivery of an IBM computer. Did the components actually not fit into the computer room as depicted in the movie, or is that “artistic licence”?

My original answer (that the movie scene was “unlikely”) starts with the third paragraph. I then got interested in the movie and liked the characters and the acting, but didn’t think the writing and directing did justice to the historical characters or the science. Comments on the movie follow the original answer. Then I decided to get the book “Hidden Figures” (which I found to be much better than the movie (comments on this follow). And then I decided to get more historical documents, especially including the autobiography of Katherine Johnson.

Bottom line: the larger subject matter, story, and people are really important and need to be widely told. More accuracy could have been achieved with more work, and would have made the portrayals even more important.

---

Original Answer: I think the sequence mentioned in the question occurs around 0:41 in the movie with the delivery of an IBM 7090 mainframe. The depiction seems quite unlikely to me for a number of reasons.

The largest reason is that IBM historically used an army of support people for every aspect of their products. “Support” was their defining feature.

So everyone handling the equipment especially at early stages would be IBM personnel, and there would be lots of them. And planning was IBM’s middle name. Their whole deal was not making great computers but to be the most comfortable manufacturer to their clients.

In those days you got permanent people with the computer after the installation who would take care of every form of maintenance, including fixing the machine and training both operators and programmers. (This was the case even with the tiny IBM 1401 we used in the Air Force ca 1961.)

Movie Comments

Also, the later “we can’t get this machine running” sequences seem quite false to me. As do other sequences involving this machine. That just isn’t the way IBM operated (I didn’t like their architectures, but I admired their processes).

Other scenes were just silly. For example, putting the oscilloscope probe on a different wire never seen before and getting a positive result (nope!). This is too bad because what these talented and skilled ladies were actually able to do was really good and very important — but the movies are a tough place to explain reality (same with the Turing movie, which never really touched what Turing did, how the Bombe was actually done, or what he was most famous for).

In the very beginning of this movie there is the question “Isn’t there anyone around here who knows analytic geometry?” This in front of the head engineer, who certainly does (as would all the other Langley mathematicians, engineers and scientists involved (of which there were already hundreds)). This was worrisome to see.

Another side comment is that the 7090 was a ca 1960 transisterized version of the physically even larger vacuum tube 709 mainframe (which they would already likely have — they actually had a 704 in 1957 (see below)), so adding in a new mainframe and dealing with it would be nothing new. Similarly, FORTRAN was launched in 1956, and ran on most IBM computers — both large and small — including the predecessor 704, etc.

Poking around to try to find a history of computing machines at NACA, I found this picture of an IBM 704 (even more a predecessor of the 7090) at NACA in 1957:

image

Note, there are both a male and female operator. The book reveals that NACA installed an IBM 650 (a rather fun drum oriented vacuum tube computer) in the mid-50s and started to calculate trajectories etc with it. The manual they would have used would look like this:

image

The book also reveals that the 7090 in the movie was actually one of two that went to Goddard Space Center in DC, and the 704 above was the computer at NACA (and there was no 709 in between). Also, the tape drives didn’t look right in the movie. IBM would have been using 729 drives in 1961 (too much of a nit).

Another side comment is that the trajectory for Alan Shepard’s flight

image

has to get the capsule downrange in a small enough area to be picked up by the Navy. It’s worth noting that the V2 rocket in WWII traveled about the same distance (about 235 miles to London) and as accurately (it generally targeted about a 10x10 mile area). This still had to be calculated for Project Mercury, but it was not a new problem — and the path was not actually a parabola (see below).

About race and gender

The movie is set in 1961 in Virginia — the very same time period I was in the US Air Force stationed in Waco, and then Randolph AFB, San Antonio Texas. My best friend was the best programmer at Randolph, and was a supersmart black guy from New Orleans. We once got refused any service in a bar in San Antonio because he was black — I was from the Northeast so it took me a while to understand what was going on — but inside of Randolph there was no hint of any prejudice.

Nor were there any hints of prejudice towards women: my boss was a woman, and an extremely talented programmer, and many of the programming bosses at Randolph were also women.

However, all the women programmers on the base were civilians, who were mixed in with (fewer) male civilians. So we could guess that the Air Force was lagging in how its military women were used (in looking this up, I found that women were in a separate organisation called the Women’s Air Force — which was “kind of part of” the Air Force — until 1973, when this separation was dissolved).

I think the same situation was pretty much the same within government science/technology institutions at the time. Some of the best SAGE (air defense) programmers in the 50s and early 60s were women — Margaret Hamilton was one of them: she soon went on to write software for the Apollo Guidance Computer at MIT, and quickly wound up as the head of all the operational software for the AGC (and eventually wound up winning many prizes for her SW that saved the first moon landing mission). She recounts “some, but not a crushing amount, of male resistance”.

So I can believe that the “out in Virginia” sequences in this movie still had lots of prejudice against blacks in 1961, but the “inside NASA” sequences with both black prejudice and gender prejudice don’t ring true to me from my Air Force experiences. What counted back then, in my direct experiences, was ability. If you could show it in the Air Force, it didn’t matter whether you were black or a woman or both.

I have a pretty strong feeling that the same was generally true within NASA (for example, I really doubt that NASA had segregated restrooms whether in the south or not) — the Air Force at the same time (1961) definitely did not have segregated anything, even in Texas). Of course, NASA was civilian, and there could have been differences.

I felt that the movie had some important things to say, but didn’t do a strong job with them. For example, the more true to life contrasts between the prejudiced outside world, and the much more equitable inside world could have made great realistic drama and impact.

On the other hand, the principal actors are able to transcend the material they were given with high quality performances that most of the time portrayed “higher points” and rendered the many flaws in the writing and directing irrelevant.

Another answer (by Donjanelle Robinson) provides additional notes of errors and facts:

https://www.quora.com/What-are-the-historical-inaccuracies-in-the-movie-Hidden-Figures

The Book “Hidden Figures”

The book is many levels of quality higher than the movie, not just in historical facts, but in quality of writing and in depth of coverage. It’s worth realizing that a movie script is not just limited on each very spare page, but also in length (they average about 1 page per minute of film, so: the script would be about 126 pages).

By contrast the book is well over 300 pages with considerable background information, and with its main story starting much earlier in WWII with the expansion of NACA’s (not yet NASA) pool of mathematicians and “computers” (human beings, most often women, with mathematical backgrounds who used mechanical calculators as prime aids) — to include black women.

The complexity of segregation in each era is well told in the book. In many ways, this is the center context of the book. For example, the state of Virginia had many segregation laws that were the opposite of what was supposed to be the case for federal employees, and what resulted were tugs and pulls back and forth (in fact, for a variety of reasons, including Senator Harry Byrd, Virginia could easily be rated as one of the most backwards states regarding segregation). It’s brought out in the book that many of the technical men were from the north rather than the south, and this turned out to be a positive factor, both for women and blacks.

Also, in real life there are often many people involved in events worth making a movie about. These become also “Hidden Figures” who are omitted in a movie version (and in this movie, having three distinct female protagonists also made it complicated even with its over simplifications). There are many more key figures in the book, and all are worthy of attention.

I wouldn’t omit any of the great number of facts in the book, but I found myself making little charts to help me keep more clear who was doing what, when, and where. I think I would have preferred longer sequences staying on single characters.

I was sorry that the author didn’t make more of an attempt to explain the nature of the work, for example what “analysis” means mathematically, and what “numerical analysis” means to continuous mathematics, and why having “computers” helps. This could be done without requiring the readers to understand much more than how difficult to handle math relationships can still be approximated numerically. There isn’t much more explanation than the movie (but less of a feeling of “rabbits out of the hat”).

For example, everyone can understand that a smooth curve of any shape can be approximated very highly by a sequence of connected short lines that are as close to the curve as possible (to get closer you make the line lengths shorter). This idea was used by Newton in the 1600s, and a nice way to do it was found by Euler (this is “Euler’s Method” which is not explained) — a big deal here is that quite a bit of all the computations done at Langley would have treated difficult problems by approximating them with line segments and solving for each line (this might require several thousands of calculations). For example, Seymour Papert used a mechanical robot “turtle” with a pen in it to teach children about parts and wholes, analysis and synthesis, and forms of analytic geometry. Here’s what 10 short line segments look like when hooked together into a polygon, and the second looks like a smooth circle curve but it is 360 short straight line segments. The LOGO program for this is repeat 360 [forward 1 right 1].

image

The differential equation for this idea (in English) is “change of curvature = 0”. This means the curvature of a circle is constant — and we can also easily see this from the program, where the step and the turn at each step are constant.

“Analysis” means “take apart” and we can see how all this works if we are trying to understand how a complex shape — like that of a wing — interacts with air at different speeds.

You can use a board for a wing but the streamlining makes it much more efficient. Often the streamlining was achieved by watching smoke streams in a wind tunnel and then carving wood models more smoothly to eliminate turbulence. The final shape might be difficult to handle algebraically, but (second picture) we can approximate it with line segments. This is “taking apart” (“analyzing”) the curve into separate simpler pieces.

image

The third picture has the arrowheads removed, and the fourth picture is with many more straight line segments. Each one of which can be treated with easier mathematics (still lots of things to worry about) separately. The separate computations of each little segment of a complex curve, was a large part of the work of the human computers.

This is both why “math can be lots of fun” and “made very understandable”!

The same process was done for the flight paths of the capsules. For example, the Alan Shepard flight was not actually a parabola, but a much more complex curve because of the various resistances of air during the climb out and return (similar calculations had been done by hand for artillery shells for many decades, and one of the main jobs of the original ENIAC was to calculate trajectories). This was a more complex calculation for space vehicles because they wanted them to be as streamlined as possible going up, and as blunt as possible coming back. It’s the green more complex curve below that would be approximated with line segments. Note: that each line segment is a hint for what the attitude and speed of the spacecraft should be at that time and place.

image

I also wish the book early on had not stooped to the fake mystique of “faster than the speed of sound”. Most bullets go faster than sound, and more interestingly, in the 40s, the V2 rocket went much faster than the speed of sound and was not at all torn apart.

The book is at its best in providing a large scale picture of many dimensions of prejudice, segregation, tolerance, determination, etc., in NACA which morphed into NASA after Sputnik in 1957. This portrayal was done enough better than the movie to make up for its lack of explanation of what the women computers were doing (and also to get a more accurate timeline on the many events). The book shows that a lot of clean up changes had already happened by the time the events in the movie started.

All this got me to look at more direct sources, including an autobiography Katherine Johnson wrote for junior level readers “Reaching For The Moon”.

Katherine Johnson autobiography: “Reaching For The Moon”

This book is pure gold on so many levels, and I wish the movie (or a movie) had been made from this story.

<more to come>

---

This picture is of Melba Roy Mouton, Program Production Section Chief at NASA Goddard Space Flight Center in 1964

image

What is the significance of Butler Lampson's work?

A good reply is much too large for a Quora sized reply.

Basically: Butler is one of the most significant computer scientists in history — and has been one since he moved from physics to computing almost 60 years ago. He was always a “principal scientist, investigator, and thought leader” even as a grad student. He is a quintessential example of the word “brilliant” (and the term seems to have been coined for him).

image

In the 60s while hearing a rapidly delivered perfectly formed Butler talk, a fellow grad student turned to me and said “You know, it’s just a privilege to know someone like Butler”. When asked about “computer science” and “computer scientist” I’ve said “Look at Butler to see an example of a top computer scientist — it’s hard to put into exact words, but he is one”.

The list of his accomplishments is long, and so is the list of awards. Much of his significance has been in how he was able to both understand, invent, and influence the large world of systems of many kinds. These include a number of significant operating systems (including SDS-940 and Tenex), computers (including the Xerox Parc MAXC and Alto), networks (including the Ethernet and the Internet), and programming languages (including Mesa).

He is both a great thinker and a great doer, and both of these plus his larger than life personality almost literally dragged the whole field forward, and made all the rest of us into better computer scientists. I.e. a kind of “magical person”.

I will return here periodically with a little history and a few stories, but they are really commercials for readers to look at larger accounts of Butler’s larger than life history.

The citations and bio for his 1992 ACM Turing Award (the highest award in computing) are good to start with:

https://amturing.acm.org/award_winners/lampson_1142421.cfm

Next, the Wikipedia article —

https://en.wikipedia.org/wiki/Butler_Lampson

Pretty much all of his more than 100 papers are not just well worth reading for content, but also to savor his crystal clear style.

https://www.microsoft.com/en-us/research/people/blampson/publications/

---

Early Butler Projects

Butler got an undergrad Physics degree at Harvard ca 1964, and traveled to UC Berkeley for a PhD. As he later put it, he “walked in the wrong door” and found students (one of them the legendary Peter Deutsch) debugging what would be called the “Project Genie” time-shared computer and OS sponsored by ARPA-IPTO. He immediately pitched in, and essentially never left this room. Soon, he was the main designer of the OS, a powerful “thought-force” for the HW modifications and whole project, and quickly wound up as an ARPA co-PI.

The resulting system was so successful that ARPA pushed SDS to commercialize it so that others could buy it (as the SDS-940). It was the computer used by Engelbart for “the mother of all demos” in 1968, and by TYMSHARE Corp to launch the first time-sharing business. And the design was recapped again for the next generation as “TENEX OS by BBN) for the DEC PDP-10, and again heavily used in the ARPA community.

The design and approach to the design were both highly significant. The base machine was about 0.5MIP and 64K 24 bit words and a bit shaky (as so much of the HW was then), so the OS had to be both comprehensive, lightweight, and as fail safe as “lightweight” could be. I had started grad school in late 1966, and found the system to be a revelation of “sweet parsimony and balance”.

Also, Butler’s “hobby project” on this system was “CAL” an optimized version of JOSS, that constituted a “sweet spot” for interactive computing by the larger public.

All this was essentially Butler’s PhD project. Via his obvious brilliance, and in the midst of the ARPA-IPTO community he was let to run free by his advisors: supreme computerists Harry Huskey

https://en.wikipedia.org/wiki/Harry_Huskey

and Dave Evans

https://en.wikipedia.org/wiki/David_C._Evans

who a year or so later at Utah birthed modern 3D graphics (at Utah I was also lucky to have Dave as my PhD advisor).

A lot can be learned about design from looking at the Project Genie OS and subsequent work (hint: look at his succession of papers and commentaries). I’m trying to work on the “significance” part of the question, so I’ll mention just two of the many interesting insights and features in this system.

The first is that I think it was pretty much the first OS that “was designed to fail”, in that contrary to most code that was assumed to work (and didn’t), Butler knew that there would be both SW and HW bugs that would cause crashes (and this is really a bad thing for a time-shared system). So, a percentage of the already meager resources of the HW were used to both mark pages as “dirty” or “clean”, and to continuously try to write out clean pages to the secondary storage device. This meant (a) that a need for a written out page could almost always find clean storage in core to put it (without having to write out beforehand), and (b) that a crash generally had most of the end user’s pages in a clean and written out state, and this along with other provisions almost always made crash recovery very swift.

The second had to do with “training programmers” to pay more attention to taking care of their working set. The scheme had to do with the HW maps forcing a working set to be smaller than the physical core storage. This allowed the next user’s working set to also be in core ready to go. The wrinkle was that a programmer could predict to the OS what pages would be in their next working set time quantum, and if true, the OS would reward by running on the fast queue. If the prediction failed (and an unpredicted page fetch was required) then the programmer was punished by being run on the much slower slow queue.

Both of these — along with other special characteristics — allowed the 940s to have remarkably efficient performance. It could run several dozens of users with 0.5 MIP and 192Kbytes of RAM — and it was also able to be the workhorse for the Engelbart’s group amazing “Mother of all demos” in 1968.

Next

If I were to write a next section here, I think I’d talk a bit about Butler’s work and influence on design, safety, security, confinement, and meaning in systems. An important idea he emphasized from early on is “In computing, design rules change dramatically every few years”. A lot of this is from both Moore’s Law scalings, and from the amount of learning that can happen in a few years.

This means that trying to carry old ideas and techniques forward is fraught with disaster, and if done, has to be done very carefully. Butler followed his own “hints” and tried to do new designs that fit the new situations for each new system he worked on. Often this had to be done with less than ideal HW from manufacturers. With the SDS 930 that became the 940, for Project Genie, they were able to make a new memory mapping system, and found very successful ways to get considerable power from what seems like meager resources. A User Machine In A Time-Sharing Environment covers some of the most important ideas and insights here.

On the later CDC 6400 at Berkeley, they were pretty much restricted to using the existing HW while trying to come up with a radically new way to structure, protect and make an OS that used the idea of “capabilities” — a secure unforgeable reference privilege with limits — but without having HW support.

On Reliable And Extendable Operating Systems (1969) gives a clear account of the thinking and resultant designs of this unusual system design.

<more to come>

How does a computer know what to do with binary code 01000001? How does the computer know this means A who told the computer?

To get started thinking about this, it would be a good idea to not use the word “know”. A better really simple analogy is how keys and locks work, where the lock is set up physically to accept or resist the carved patterns on keys. We generally would not say — except perhaps in poetry — that the lock “knew” the key.

The so-called weak “AI” of today cannot be said to “know like a human”, so we can leave out “know” here also.

A good starting analogy here is how Morse code has been used in the telegraph for more than 150 years. Lots of methods were tried, but things worked best if the simplest signals were transmitted over the wires (long ago they picked long and short noises). Then they had to come up with patterns of these to stand for the letters and numbers used by humans. For example, the pattern for “A” in Morse Code is “short long” or “dot dash” (and written as “. -”.).

To send the letter “A”, the sending human taps out “short long” using the telegraph key — this sends a short and long impulse of electricity over possibly many miles to an electromagnet and “clicker” which turns the electric pulses back into sounds that the receiving human can hear. If this person knows that “short long” is supposed to mean “A”, then they can write it down.

On a computer you can use “Voltage and no Voltage” for the two codes. These are often written 0 and 1 (even though they are just “Voltage and no Voltage”.

Imagine a panel with 8 switches. Now can we connect wires so that only the switches that correspond to 01000001 will turn the light on. How? We can use double throw switches in series:

decide that an up switch will signify “0” and a down switch will signify “1”,

image

and wire them so that:

up=on dn=on up=on up=on up=on up=on up=on dn=on

image

And this will only pass current to turn on the light when only the switches are thrown correctly (something very like this is used in computers for decoding a binary number into unary selections). Can you see how this is like a “lock” that can match to a particular “key”.

Humans can then assign a pattern like this to stand for an “idea” (like “A”).

Whats fun about the computer is the extent of things that can be done just by manipulating patterns.

Most transistors in a computer act like switches that can be thrown off and on via receiving electrical signals (so computers are mostly organizations of “switchable switches”).

At some point it should be possible to make something just out of patterns that behaves like a mind — even a simple animal mind — but this has turned out to be surprisingly difficult, and is still a ways off.

How does a computer work? Explain with an example.

In the answer to the parent question, Jay Nabonne gives an excellent suggestion: to find ways to understand computers, then try to explain them yourself to others (which will generate your own words for this).

So I think your question is mostly: “How do I understand how a computer works?”

There are many answers on Quora already for this (including from me). When people ask me for something good to read to start off absolutely from scratch, I usually recommend “The Pattern In The Stone” by Danny Hillis. It is a small masterpiece.

What does Alan Kay think about the harmony-oriented programming paradigm (https://dl.acm.org/doi/abs/10.1145/1449814.1449872)?

This URL results in a 404 error (the correct URL is in a comment). This is a very short speculative paper. In the same comment is a reference to a 2 page poster, and a Master’s thesis by Kyle Martin - “Harmony Oriented Architecture” https://stars.library.ucf.edu/cgi/viewcontent.cgi?article=2765&context=etd, which has more content.

In the spirit of “Yes, and … ?” that is the center of improv and tentative thinking, I think what these papers are calling attention to — the difficulties of scaling code and systems, and the need for great improvements — should be greatly encouraged, experimented with, discussed, etc.

I think the call for ways to have looser couplings being part of the larger scaling solutions is good. And some of the citations, especially in the MS thesis, point to praiseworthy places to start. For example, I’ve long been a fan of LINDA as a way of thinking — especially as a metaphor — for how to get started, especially to try to coordinate via descriptions rather than domain labels.

I also like the goal of having the system work successfully in the preseence of noise and errors.

The rub here is partly that we need to actually do implementations to vet ideas, and that feasibility studies in scaling are not easy to set up. For example, the Internet was very carefully thought through and simulated — and Parc even had its own real “internet” (the PUP network) — nonetheless, the actual making of the Internet still required more heuristics to be added to TCP/IP in order to stabilize the larger system.

Similarly, the massive over-all-Internet replicated pseudotime system — Croquet — which had its birth in the PhD thesis of Dave Reed in 1978 — had to be implemented and tested and redesigned and implemented many times before it really started to scale well.

And earlier, Smalltalk had 5 distinct designs and implementations over 10 years (the last 3 were of great practical use, and all were used extensively).

This is partly a general principle, and partly a direct consequence of the combination of scaling and degrees of freedom, etc.

For “Harmony, etc.” there is not enough practice experience for vetting that I can see.

Its fans see possible good properties (and I think there are a few in there). But I also worry about the level of solution proposed. For an example of a nit, the rhetoric about “encapsulation” misses that an object can expose any part it wishes to, while still being able to protect the parts that shouldn’t be exposed.

I also feel that much too much of the discussion is trying to preserve “bad old ways” of doing things. I would feel more comfortable to see sample code in a new language made for making the new kinds of system proposed.

However, I would definitely sign Kyle Martin’s MS thesis because it is a good enough, wide enough discussion to count as an intellectual contribution to an ongoing discussion of very important issues.

What are some tips for a 16-year-old that wants to be a professor in computer science, specifically in type theory and programming language theory?

I think the best way to be a professor in a subject is to first become a first class practitioner in the subject, and gradually learn how to explain it and help all levels of interested people to learn it. A good doer does not automatically become a good teacher — both parts require amazing amounts of work (and I think the “good teacher” part is the hardest).

One of the signs of first class practitioners is that besides being very skilled, they can also deeply criticize any and every part of the subject. This is especially critical in computing because so far it is much too ad hoc. One way to think of this is that good criticism is opening the door to much better ideas (and at the least allows good warnings of “danger ahead” to be part of the thinking process).

In order not to get buried in what people have done and are doing, I suggest using the question “why?” instead of “what?” or (worse) “how?”.

For example: why types? An early impetus was to help compilers generate code. E.g. computations with numbers don’t semantically need to know int or float, and in many cases an algorithm will need both. Another possible use for typed variables is as documentation — this is using the variable name as a stand-in for an entity that will now mostly help human programmers. A worthwhile case to consider is what to do if someone sends us a module over the Internet that we want to make use of. What is its “type”? What are the “types” of its API. Etc. Right away we should see that we need dynamically adjusting systems/languages rather than the old-style static approaches to semantics. I.e. We need “semantic types” and the field doesn’t have them yet. A stopgap is to have a dynamic language — it can be fixed so it can’t crash — that can dynamically adjust to both changes and unknowns.

The above paragraph amounts to the start of really big criticisms of most parts of computing (much of which insists on living in the past, and especially using old ideas that don’t scale).

We can see that some form of “typing” can be really useful, but that none of the currently used approaches is very good (I think most forms don’t really pay their way).

Similarly, one should ask about programming languages: why C or C++? Why Haskell? Why Smalltalk? Etc. A good answer for any of these will beg for something qualitatively better …

A good exercise — which most in computing flunk — is to be able to really criticize the WWW and especially its browsers …

I was very lucky as a grad student to have been influenced by Bob Barton — the main inventor of the B5000 — an authentic genius. In his advanced systems course he handed out a list of things for us to “read and remember”, and used the actual class time, as he said “… to disabuse you of any fondly held notions you might have brought into this classroom”.

Basically, he destroyed the existing field for us (including some of the things I knew he liked). Those who survived this class were set free of “the tyrannies of the past and present” and were now open to inventing better futures (sometimes radically better futures).

This could be tough today because even the “leading” CS departments do much too much training for jobs in existing business rather than actually helping students become real computer scientists and teaching then not just to think for themselves, but how to really think well.

Is it possible to have a computer without critical section?

Somebody (a gremlin?) changed the question after my answer to include the word “multiprocessing” (which makes a different context).

New answer is still “more or less yes”, but with a few twists. For example, some of the earliest multiprocessing was done by “roll in and roll out” of entire memory images, so it wasn’t possible for one process to write into another process. This could be tricky, because the nascent OS code was visible to every process.

---

Original answer to original question: Is it possible to have a computer without the use of critical sections in software?

Yes, and most programs didn’t have them in the early days.

The first programming job I had was in the Air Force in 1962, initially doing conversions from punched card machines to the IBM 1401. Interestingly (ironically?), both the multiple punched card machines and their plugboards were often run in parallel, whilst the tiny (8k 6 bit bytes) and slow 1401 generally didn’t have enough memory for any kind of concurrency.

And this was also the case for the slightly larger (and also slow) Burroughs 220 in our shop.

So there were no interleaved threads that could mysteriously clobber shared variables. And no critical sections to be protected. However, interrupts acted like transient threads and could clobber anything anywhere (in this case the entire program was a “critical section”, but some care would be taken …).

And, variables were still shared in single threads, and these could clobber a value in a variable that was expected in another section of code (this is STILL one of main causes of error even in relatively simple programs).

When multiprocessing was first introduced, the idea of critical sections was thought up as a simple way to suspend threads looking at the same section of code to avoid races between threads clobbering shared values. This was initially a practical solution, but it scaled poorly.

For serious multiprocessing, and especially time sharing, the parallel processes were hardware protected. This was unfortunately but pragmatically combined with MMU and swapping protections also in the HW. The combination did not scale well, and the HW manufacturers did not fix this (to this day). Then programmers reintroduced unprotected threads within the processes (a really bad idea also … and to this day).

Meanwhile, in the late 50s John McCarthy was specializing in “great ideas that ran poorly on contemporary machines”. One of these was Lisp. Another was the idea of “fluents” (or “pseudotime”). The basic idea was not to let the CPU determine time but instead to simulate time along with everything else. This allowed a real definition of “simultaneous” and was a scalable (upwards) way to deal with and eliminate race conditions. (I’ve tried to explain this idea in a few Quora answers …).

Soapbox: computing has a strong tendency to hang onto an idea — often by patching — long after it has become obsolete — and also tends to avoid learning new ideas, even when tremendously useful. This usually results in dangerous bloated messes.

What is the origin of model-view-controller?

Ivan Sutherland — the inventor-originator of interactive computer graphics — had the idea of a clipping window to see a part of the very large Sketchpad world. This wound up being a program structure roughly like a camera which had an aperture that could be of any size relative to the world, and would show that portion of the world on the screen. Each Sketchpad object had a method that would render the object in the world, and the windowing apparatus did the job of transforming world coordinates to window coordinates (this was done as part of the rendering operation).

image

Ivan Sutherland in 1962 at the building-sized TX-2 computer working with Sketchpad on a truss bridge design with the screen showing part of the whole bridge that has been zoomed in on. He is holding a light pen used for pointing and graphical input. This year is the 60th anniversary of Sketchpad.

The next version of Sketchpad was a 3D system (done by Timothy Johnson), and it supported 3D multiple views of the world.

image

Ivan’s next major project was the first virtual reality headmounted display, and this required tracking the head of the wearer, and using this to sample from the interior 3D world for each eye.

image

So there was a similar software structure like a telescope or camera that had one end in the virtual world, and had the other on the display face. Both the to be visualized objects and the “cameras” had six degrees of freedom. The virtual world side was sometimes called the “camera”, and sometimes called the “window”. The real world side was often called the “viewpoint” (at Xerox Parc, this was renamed the “window” to help explanations to Xerox).

When I was a grad student at the U of Utah, I overlapped with Ivan, the HMD project, and its approach (meaning: there is a world with virtual objects, and these include camera objects whose purpose is to look at parts of the virtual world and send these to be displayed on windows on a display). The simple linear equation for coordinate transformations was now a many degrees of freedom matrix that required quite a bit of special HW support (that had to be built from scratch).

At some point I started thinking about other than graphical virtual objects in the “everything is an object” ideas I was pursuing. The idea was that every object should know how to render itself, and this should be combined with a windowing-viewing mechanism. A fun thing was the idea to use multiple windows to show different aspects of the objects in the view. I first saw this in Sketchpad III, then in the Engelbart system in the late 60s (it was mostly used for filtering detail), and used the idea in the FLEX Machine that Ed Cheadle and I did.

The earliest versions of Smalltalk were simple and slow, and included multiple windows and “turtles” that could be part of a window (that would then clip what the turtle drew).

image

A Smalltalk-72 screen showing an editable view of an Elvish character, which is used in one of the paragraphs in “the galley editor” (an early experiment in desktop publishing). The galley is a collection of views of both paragraphs of text and a painting view with a sketch of Frodo and Gandalf. The user interface for a painting view pops up around it, and it in turn is made from views of the controls.

Another early plan — not done until later — was to have a slippage scheme, because the simulation “frames” from the model could often be computed much faster than graphical rendering of the view, and thus one would like to decouple the updating of the semantics from the updating of the display. This was later actually done in a principled way when MVC was implemented.

Shortly after Smalltalk-76 was done, we were visited by Trygve Reenskaug from Norway, who was very interested in large scale planning (such as for the Norwegian ship building industry). He was indefatigable, and decided to do a real planning system in the new Smalltalk. I attribute the first formal and working examples of MVC to Trygve, who among other things used them to view and change a complex changing schedule and to have different windows show lists, Gannt and Pert charts, etc. each of which could be edited directly and would automatically cause the model to update itself.

image

Trygve’s first planning system ca 1976–7 showing three different dynamic views of the same plan with the same item (A21) selected and editable in each view. This is arguably the first principled use of MVC.

Adele Goldberg and Dave Robson got very interested in this and took it further to the general idea that there is a “model” system, which has the semantics of the system, a “viewing system” which can get the needed parts from a model for each view, and do the screen painting, and a “controller” system that deals with the input devices employed by the user, does the sometimes difficult inverse transformation on the view to find what is being interacted with, and informs the model about needed changes.

This more formal system was put into Smalltalk at some point — maybe not until Smalltalk-80 — and both overkilled and underkilled. Part of the underkill was that originally there was a class “Model” with a protocol for things to be viewed by members of class “View”. But since you wanted to view everything, the model protocol should be part of class Object. Etc. This eventually happened.

And so forth. For the “simple things should be simple” part, there needed to be default views that would be automatically included in any class definition. One of these later that was much used—Morphic—was done by John Maloney — originally for Self — and then for Smalltalk. Similarly, having default controllers was very useful.

The whole framework was a very useful way to separate concerns (and I think even more should be done pragmatically with the details of creating MVC subsystems).

What was the output of the computer when there was no screen and speaker?

My vote for the first really working programmable computer is the EDSAC by Maurice Wilkes. One of its intrinsic output devices was three CRT display tubes that looked at various portions of the machine, including the memory.

image

image

Using bits in memory to show a tic-tac-toe board.

So: the first real computer did have displays — and used standard paper tape inputs and printer (from earlier teletype technologies)

image

Computer Scientists: if you had to recommend just a single book to try and teach someone computer science what would it be?

My parallel to this question would be to replace “computer science” with “Physics”.

60 plus years ago when I went to college, good answers would have been “Sears and Zemansky” or “Halliday and Resnick”. These were both thick tomes aimed at more than a year of first year Physics, and covered most important aspects of the field in a first encounter form (but with enough math depth to be like more advanced courses.

When I wrote this answer, there were 2 other answers and a comment. The recommendations were two books about programming, and one, more general, but without a needed next level of detail (the Hillis book “The Pattern in the Stone”).

I like all three of these books, but computer science is not the same thing as programming, so I recommend “The Pattern in the Stone” quite a bit as an intro to “computing” — it’s a terrific book — but it doesn’t have needed details, etc.

Nowadays, there is quite a bit more confusion about what “computer science” actually should mean. Most students I’ve asked over the last 25 years or so, have given an engineering — rather than science — definition. In the 60s, the first Turing Award winner — Al Perlis — said “It’s the science of processes: all processes”.

Science is trying to make theoretical representations that mirror phenomena and their possible causes. The phenomena could be generated by nature or via constructed artifacts (like bridges and dams, etc). In the latter cases, we have both theories that guide the constructions and the usefulness of trying to find better theories for the artifacts. In the case of programming, we are essentially getting a computer to run a theory to produce phenomena.

And so forth. I’d love to see a single book that has that level of sweep! And there might be one that I don’t know about. Right now, I think it would require cherry picking for topics from a variety of separate books and papers combined with a particular pretty unified conception of what “computer science” today should be about.

The lack of this makes me think that we don’t quite have a real field yet, in the sense of “science field”. But I do think that you could do a lot by fleshing out Hillis’ approach to encompass Perlis’ early posit. For example, you could look at an amalgam of approaches like that of Nissan & Schocken, and of SICP, and general systems (and these can be simplified without removing depth, etc.).

Aside from programming, can they discover a way for programming digital devices without programming languages? If yes, like what?

Punched card machines, e.g. from IBM were digital devices that were programmed with plug board wiring, not programming languages. Some of them were quite capable.

image

I did a little of this in the early 60s before learning the 1401 in the Air Force …

The dominant paradigm is modern programming languages is Object-Oriented. Why should developers abandon this paradigm and adopt Golang if their application doesn't have huge number of concurrent IO requests?

As I tried to point out in a previous answer, I don’t think that “object-oriented” is a “programming paradigm”; instead it is a way to define computational things at will (because the building blocks are semantically whole computers). So it is basically a meta-framework.

I think a “paradigm” for programming should be both a style and somewhat prescriptive as to how to go about using the basic materials to make things.

For example, “object-oriented” allows things like data structures and procedures to be defined, or the kind of functions that “functional programmers” like, or highly modular systems that integrate separations of concerns, and so forth, including old style AI expert systems with complex relational ontologies, etc. Some of these scale well, and most have great difficulties.

I always judge a programming language by how “meta” it can be when needed. What is required when “Paradigm X” is required to best program a new system? One hint: is to ask to what extent and practicality can new features be made part of the programming language being used?

This requires the ability to define and install new features — with regard to syntax, semantics, and pragmatics — at any level of a language (including its own building blocks).

This has been experimented with over the years by various top researchers (one that was pretty nice is described in the first 100 pages of “The Art of the Metaobject Protocol” — this is not the only way this could be done, but it is very suggestive of good pathways). TaotMOP was done almost 40 years ago, so there’s a good chance that something better has appeared since, and I just am not aware of it.

The key idea is nicely independent of the exact methods used, so there is more leeway than most people think. I think this whole question is a difficult area for the computing of today to contemplate without feeling overwhelmed, because it is so design intensive, and today’s general approach is very “agglutinative” with less forethought and overview than most situations actually need.

What does a computer without an operating system look like?

Like many computers of the late 40s, the 50s, and into the 60s. They were “bare” HW plus a few external utility programs, for example, the IBM 1401, or early DEC computers.

Sometimes one would write what today might be called a “mini-operating system” to aid work flow. For example, we had an 8K 1401 in the Air Force, and were able to put a tiny batch job “OS” (in the upper thousand characters of memory) that would automatically deal with running batch jobs, telling the operators what tapes to mount, etc.

We also had a Burroughs B220 in which a data tape had as its first segments, code that understood the format of the tape and “object style” had a table of operations for the semantics and pragmatics of the rest of the tape.

In any case, the notion of an “OS” was early in one respect (as code to provide a better programming environment than the bare hardware did). In many examples in the early days these were supplied by macros from a standard library.

The ca 1960 desire to share resources for multiprocessing and time-sharing forced development of something more like modern operating systems on one hand and memory protection and mapping on the other. The latter lagged the former (for example, CTSS on the 709 at MIT rolled in and rolled out whole core images (it had no map). IBM was very late on most of this.

On the other hand the early Burroughs B5000 had a quite elegant fine grain memory allocation and control scheme, and early in the 60s, not only had a resource shared OS but multiple physical processors, all controlled by the first OS written in a higher level language.

How did Ada start programming when computers were not invented?

You don’t need a computer to write a program, or to execute it. For example, an early algorithm that was done by hand for many years was the greatest common divider (one of the earliest is attributed to Euclid).

One of Ada’s programs was how one would compute Bernoulli numbers on Babbage’s proposed Analytical Engine. The Engine was never built, but she probably executed at least parts of it by hand (and also left a few bugs). Take a look at

https://twobithistory.org/2018/08/18/ada-lovelace-note-g.html

to get more of a feel for this program.

A side note here is that in the early days of computing, most programmers did most of their debugging at their desks, via hand execution (I certainly did when I started in the early 60s).

What guidelines would Alan Kay recommend to model something confusing?

First, is the area “intrinsically confusing” or can the modeling clear things up?

I think of most things that have a “systems nature” as being the former, so I like the idea of “multiple covering heuristics” that together can handle the errors etc of the whole — a simple great example is TCP/IP, whose combined half dozen or more methods operate and “stabilize enough” the whole Internet.

In more semantic areas, more semantics — especially that allow reflection — are usually what is needed to provide “multiple covering heuristics”.

I think most people who think of what they are doing as “making and modifying systems” will also approach things from the “error detection and correction” standpoint.

My past research groups got a lot of mileage by building a combination of powerful simple languages combined with the most powerful debugger we could come up with.

I think that — today — most of this needs to be done by a “coherent goal system” itself, and that what’s needed is a second or third order organization of elements that can deal with itself. For example, Facebook has many hundreds of system integrity tests, but they were not up to knowing the goals of the system (for example that it needed to stay on the Internet!) and this led to at least one very large crash of everything.

The lack of semantic modeling and reasoning etc — especially of overview goals that are dynamically involved with software systems — is one of the most glaring lacks in software today (and something that was more or less abandoned 30 or more years ago).

Added April 2nd:

Ken Kahn directed my attention to the excellent, very recent, paper by Doug Lenat: Creating a 30-Million-Rule System: MCC and Cycorp Creating a 30-Million-Rule System: MCC and Cycorp

This is an interesting example of how to think about and deal with “modeling something confusing” i.e. human common sense reasoning — and it is also an example of the kind of system that should be behind the programming systems of the future.

What paradigm is the successor to OOP?

“OOP” (as I think of it at least) is not really a programming paradigm, but is much more a workable “universal definition” scheme, and that is especially suited for defining large systems.

It is basically an abstraction of an unlimited number of complete computers networked together (and where — by definition — the interior of the computers can also be a system of computers (both real and virtual). If you only worry about semantics, this provides a very powerful universal building block at all scales.

Metaphorically, this can be thought of as a “universality” like that of NAND or NOR, vastly scaled up: you can build any kind of functionality you want, but there is no hint of how to design the organization of universals.

In practical terms, since you can imitate (simulate) any idea, you could choose to use a real OOP framework to simulate old familiar ideas — like data, procedures, etc. — or you could choose to use the framework to deal with vast scalings and new situations brought by Moore’s Law.

Historically — and unfortunately — “OOP” starting in the 80s has generally chosen to simulate old familiar kinds of things (via its subset use as Abstract Data Types). For many reasons, this kills “graceful scaling” (and has done so)).

So — for general/commercial use — “OOP” needed to be packaged not as a programming language — too many degrees of freedom for most programmers — but as a framework loaded with powerful design schema to help programmers learn ideas far beyond mere programming. That didn’t happen.

One way to think about “declarative programming” is via an analogy to a system of “simultaneous equations”. If there is a solver that can solve them, then it is extremely handy to just add a new equation for each new situation, and let the solver find a viable solution for all together.

Note that a system of simultaneous equations, quickly gets difficult to gist — and some systems — even of linear equations — don’t have solutions. And many systems of equations don’t accurately describe the desired system.

Eventually, this will give rise to a higher way to think about this e.g. matrix algebra. (But still, how can this be grounded in more meaning that transitory goals in the minds of programmers and management?)

Declarative programming is all this and much more (an insurance system I’m aware of has over 100,000 requirements — and the working system is a partially unknown approximation to those requirements).

To me, all this (and more) implies that a next real paradigm (in the deep sense of the term) would be “knowledge based system building” of various kinds.

I think the minimum would be to have the system pretty much understand the goals of any new system added to it, so it can do a lot of the feasibility checking (and much deeper) that humans sporadically and randomly do today.

Such an active knowledge context, for example, would have prevented the recent Facebook crash (in knowledge terms, Facebook’s code didn’t even know that it was hooked up to the Internet and that it didn’t want to unhook, etc.)

Making a minimum knowledge context — and languages that could be in terms of it — is a monumental amount of work and invention (not unlike the amount of work and invention required to come up with our current wave/paradigms of computing).

I don’t see much taste for this amount of work and invention in the field today, nor any inclination from funders to try to do a restart of the field.

Paradigms in Physics are large qualitatively separated and different whole points of view (like Newtonian Physics vs. Quantum Physics). If we take that to heart with our field, then we should be looking for something revolutionary (and likely: large).

What does Alan Kay think about Douglas Lenat's Cyc?

CYC is such a large complex and long lasting effort, that it doesn’t fit within the scope of a Quora answer. But as with most artifacts in computing, commentary can be somewhat separated into the idea of a system, and the processes to implement the system.

The original idea that led to CYC was articulated by Doug in the early 80s as: “Instead of automating the words in an encyclopedia, we should automate the ‘white space’ “ — that is, we should automate what a reader of an encyclopedia needs to know in order to read an article in an encyclopedia, i.e. all the commonsense knowledge that is used as context. And let’s do it as an “engineering project”, for example, can we (by brute force if needed) create an artifact that can respond really flexibly (rather than brittley) to many different kinds of contexts, etc.

Doug likened the expert systems of 40 years ago as brittle islands, and what was needed was a “commonsense canoe” as a universal fall back process to navigate where there was little or no expertise.

This is a great insight and idea. It is pretty much lacking in the so-called “AI” systems of today (and it needs much much more than ML type correlations in order to be realized).

An enormous amount of work went into a number of CYC systems, which wound up with millions of relationships to try to model commonsense reasoning. These went deeper than any other artifact that I’m aware of, and had many important successes. This work had to struggle in the “AI winter” as ML drew interest and funding away from the deep requirements of “real AI”.

I should be clear here that the CYC symbolic approach is not antithetical to ML: both are needed. The problem today is that the symbolic approach has been omitted or is too weak to do its deep part of the job. In today’s terms — now that Kahneman wrote his book on Thinking — what is lacking is the deep slow thinking in terms of models of the world, rather than the fast correlations and reactions of the Type 1 quick neural apparatus.

A bottom line here is that CYC is well worthy of study, both for what it can do that other systems can’t, and also to learn more about how such a system should be organized for a next pass.

In any case, we should be grateful that Doug Lenat has been a bulldog on this project. Without his persistence, we would have no artifact “good enough to be criticized”.

Ken Kahn provided a very useful comment below that I’m copying in here:

This led to me to find and read the excellent, very recent, paper by Lenat: Creating a 30-Million-Rule System: MCC and Cycorp Creating a 30-Million-Rule System: MCC and Cycorp

And the paper ends with a nice example of how ML and Cyc can work together as Alan pointed out.

One minor quibble is the discussion of expert systems which while it was popular at Stanford, a few other places, and some startups Lenat describes, but it was NOT popular at the MIT AI Lab, Yale AI, or some parts of CMU AI.

Where does Alan Kay see computing in both science and industry going, in terms of systems and how we think about them?

I’ve been much more effective at doing things than in armchair predicting, so I don’t know what to say here.

Over the last several years I’ve been asked to do a number of talks on “Software Engineering” (they are on YouTube), and each talk took on different aspects and problems, especially in industry (which is so large as to defy easy generalizations). Still, the latter parts I did look at, seemed pretty far off from much better much more scalable approaches, and their influence on university “computer science” courses mostly disastrous.

As to “systems”, I think there has been very short shrift all around — in many ways, the past of both CS and SE have drowned out both good ideas in systems research and also prevented needed inventions.

To just pick one area, “graceful scaling” — a proper area of systems research — is pretty much not found at all in industry. “Systems with integrity” — in a wide variety of meanings for this phrase — are essentially non-existent — witness recent meltdowns e.g. Facebook’s.

Systems meanings and reformulations etc is another big area that should be taken up in the large.

So, the most I can say is that I don’t feel good about current practice — quite queasy actually — but this not much of a “prediction”.

---

The initial answer to this is given above. But I just realized that “science” in the question might mean “Science” rather than CS. The former is about negotiation between teased out phenomena and the models we make to try to explain the phenomena — the former is “external” and the latter “between our ears”.

As Judea Pearl has pointed out, ML is pretty much large scale “curve fitting” and that “correlation is not causality”. But the kind of loose coupling and the size of system and training sets today has allowed some progress to be made in previously vague areas such as visual and audio representations, and with various large data sets.

Unfortunately, this has quite overshot where it is actually useful, to become a blinder more dangerous way to try to compute in general.

Correlation is good for drawing attention, but it would be nice to see computing catch up with real causality and causal thinking and reasoning. This also requires a kind of loose coupling, but one much harder to set up and keep sane because of the much more stringent modeling that is being done.

In theory — looking at our present from the past — today’s and tomorrow’s computing could be really usefully put to work to start creating and exercising real semantic models of the complexities of our world — and to be able to tell us and teach us about itself.

This fits into another large facet of “systems research” and it is a good one.

What's the point of life?

The point of life is to live life with a point

Has Alan Kay voiced any opinions about cryptoeconomics, blockchains, and related "crypto" ideas currently gaining a lot of exposure?

I haven’t. I feel I need more supported opinions — right now I have “feelings”

Is it true that the first computer was a loom? How is a loom a computer?

Here is an answer that specifically covers your question about looms:

https://www.quora.com/How-was-the-first-computer-operating-system-made-if-there-were-no-programming-languages

How and when did digital computers come into existence?

This question needs to be crisped up a bit. “Digital” happened before computing. And many computers were not programmable. Computing was done long before computing machinery was made and worked.

Etc.

Lots of people realized it was possible, and wanted it. I tend to give the prizes to the first all around “everything” with both hardware and software, etc. I like three groups of pioneers: Konrad Zuse in Germany during WWII, Maurice Wilkes at Cambridge, and Williams et al at Manchester, with the latter two groups setting up the earliest working foundations after the war.

What do you think about the idea of writing at least parts of law (some legal acts) in a technical language that would actually be a programming language? (So "computable law")

I got asked to give a keynote last year at the Stanford FutureLaw conference on Computational Law (yes there is one).

Start at 1:25 if needed.

https://youtu.be/FbrbNI9FbzQ?t=85

All of the existing programming languages are based on English. And some are far more problematic, as seen by the code poem "Black Perl." What would an inclusive, progressive programming language look like?

Start by looking at “chimera” artificial languages, such as Esperanto or Interlingua.

Personally, I don’t think this helps much if at all for programming.

What does Alan Kay think are the most interesting people, projects, and organizations working on education today?

I’m not comprehensive enough to give a good answer.

But the most interesting computer related one is definitely Bret Victor’s “Dynamicland” (check it out on YouTube).

Why did the designers of Smalltalk decide not to have traditional operator precedence for basic arithmetic? Did this decision lead to a more efficient interpreter?

For the same reason as Ken Iverson chose for APL. When you have an operator rich expression language, it’s better to have one rule than 100. Lisp did it the other way: by explicitly showing the bracketing of scope.

Why did Dijkstra say that “Object-oriented programming is an exceptionally bad idea which could only have originated in California.”?

When I heard this, I thought it was really funny for a number of reasons — Edsger and I were friendly (via Bob Barton) and he loved to come up with snide funny comments.

However, he certainly knew Nygaard and Dahl (of Simula fame) and that they were Norwegian.

He possibly knew that the earlier Sketchpad had been invented in Massachusetts.

He knew — via Barton — that I had my ideas while in Utah. Etc.

Perhaps more interesting is that he was the inventor of the Semaphore coordination mechanism (ca ‘62), and a semaphore is an object that is an instance of an idea about synchronization, each of which has a local protected variable and only two protected operations. An irony is this was a bit ugly to make “neatly” in the Algol of the day, and in Simula (the encapsulation could be violated). When added to a HLL, it was usually done so as a “feature” (a special kind of variable rather than to think through the larger consequences of being able to make many such useful things). (And Hoare’s “monitors” were also objects, etc.)

The other fun side-note is that McCarthy had invented “fluents” a bit earlier, a much more useful idea than semaphores and monitors for avoiding race conditions, and one that was not well enough understood at the time to catch on.

He did like to pull chains …

How might computer science evolve to be worthy of winning a Field Medal or a Nobel Prize?

Or vice versa? That would be interesting also.

In any case, it is strongly rumored that Nobel didn’t like mathematicians, so for whatever reason there is no Nobel for math.

And there is this to contemplate (which equates a variety of prizes that the Nobels don’t cover):

https://www.heidelberg-laureate-forum.org/

What are Dijkstra's major objectives for the development of structured programs?

It’s very difficult to do sufficient justice to Edsger. Besides being a foundational computer scientist — in the original strict senses of the term — he was also one of just a few critical gadflies for our whole field, and especially both CS and SE.

One way to think about a great person is that they are “great and interesting” whether they are right or wrong, or in between. This is because it is their unique perspectives that really count: they help us see from directions we couldn’t.

Those whose tiny sense of self and outlook allow themselves to be insulted by the great gadflies completely miss what “great” and “gadfly” and “progress” are all about.

One of the difficulties in dealing with the ideas of a great person is to be distracted by some of the “really good stuff” they are able to accomplish — which tends to be more rooted in an historical time — and to miss the “really great stuff” they are advocating — which tends to be for much larger durations.

In other words, the bug is to mistake the artifact for the bigger underlying idea. This has happened with a number of greats, including Dijkstra, Bob Barton, Ivan Sutherland, Doug Engelbart, etc.

One of Edsger’s greatest lifelong interests was to find out how to design and write programs that really worked and were “bugfree” — including “knowingly and meaningfully” bugfree.

Following the idea that a programmer should have enough of a goal in mind to both try to write a program to do something, and to be able to tell if it works as intended, what he wanted was to find ways to get the goal out of the programmer’s mind and into the code.

Most goals are predications, and computers present the twin problems of representing predications and running them. The systems that could do this really well — like Sketchpad — though incredible for any time — were not general purpose enough to do either comprehensibly enough.

So the question is “how can you organize — and confine — what the computer does do to approximate “runnable math”. The vast degrees of freedom of computing are fabulously powerful: how to be able to use as many as possible really safely?

Edsger came up with numerous approaches to this, many of them important. With reference to the question here, he was not looking for a religion or for “golden methods” (this was more the way Wirth thought about this).

You can learn a lot about the way he did think about things by looking at his extremely early Algol 60 compiler, followed by his approach to the THE operating system for the EL X8 computer.

One of the big ideas of the 40s — especially 50s — is that the notion of a mathematical function could be imitated very well, if done with care, and — even better — that the idea could be widened to provide a machine independent abstraction for an idea, and especially for programs that could accomplish goals. This created ways to create a design language and scheme with separated concerns for design, but provided ways to combine the concerns in higher level more goal oriented ways.

This quickly gave rise to several different kinds of programming cultures: one pretty ad hoc, and one “like Dijkstra’s” i.e. an attempt to be as “scientific” and “engineering” as possible with respect to design and building.

The “like Dijkstra” one had the disadvantage of not knowing how to “do everything right” but it did have many starts to deep and correct criticisms of the other culture — which pretty much did everything “wrong” and were so blinded by pragmatism they couldn’t see it.

I think these two extreme cultures are very prevalent today, and both are quite entrenched — from many reasons besides personality leanings — I think there is a lot of the “loss aversion” and “sunk cost fallacies” hurting progress on both sides as well.

A way to end this too long answer is to look at Edsger’s quip that “programming is logical brinksmanship”, and realize that the largest psychological difference between the two factions is that his strict computing and engineering approach wanted to avoid brinksmanship via more care and art. The term “structured programming” as used by him was larger than what it degenerated into.

A companion quote from his contemporary Tony Hoare is: “Debugging is harder than programming, so don’t use all of your cleverness writing the program!” It’s a reasonable generalization to say that many of the computer scientists, software engineers, language designers back then were trying to find way to write programs that could be reasonably debugged. It’s also fair to say that most programmers today code expecting their program to run and be easily debugged (this doesn’t work well).

Another quote along these lines of Edsger’s that is quite telling 60 years later today is:

At least in Holland, the intellectual level needed for systems design is in general grossly underestimated. I am more than ever convinced that this type of work is just difficult, and that any effort to do it with other than the best people is doomed to either failure or moderate success at enormous expenses.

All the best systems designers I’ve known over the years have put a lot of effort into confinement — and have come up with numerous mechanisms both hardware and software — some of which are completely critical today.

By in large, most programmers violate useful and needed confinements right and left regardless of the tools they have (some of which can really confine well). Or what is done is via extreme kludgery (naming 5 of these right now will help understanding of this note)

The HW vendors especially recently have done very poorly dealing with and helping these issues.

And — it would be great to have those who do realize the importance of confinements to devise much better tools that fit the needs and scalings of our time.

---

This is already much too long, but I can’t resist telling “The Sphere Story”. One of the lab managers for a few years at Parc was quite a character. He had been the Chief Scientist at a large company, and while there decided that he would like a stainless steel sphere on his desk for a paper weight. He ordered a 4″ one from the shop and eventually forgot about it. 10 months later it showed up with an invoice for $10,000 (about $85K today). After screaming at them on the phone they informed him that he only said “4 inch diameter” with no tolerances, so ….

One of the many reasons that Sketchpad worked so well is that Ivan Sutherland was an engineer’s engineer, and engineers work to tolerances. He decided to make Sketchpad worth within reasonable tolerances e.g. minimizing the least square error fits of the multiple constraints. This allowed very difficult non-linear multidimensional problems to be handled right off the bat.

Compare this with the desire of mathematicians to have “not-true” mean “false” (disclosure: I have a degree in pure math). I remember being shocked by many things in Sketchpad — and this was one of them!

And compare this with the amount of “noise-limiting and correction” in almost all other parts of computing. If you are “thinking system” you are going to have noise and will design the system to work in spite of the different kinds of noise present (including human beings!). It’s that a very different kind of mathematical thinking has to be done besides “A is not not A”.

Note that this doesn’t invalidate Edsger’s large goals, but it shows how poorly the early conceptions of programming have scaled. This non-system view is present in most programming languages in heavy use today, and just doesn’t match up to what is actually needed.

What were interviews like at Xerox PARC?

In brief: in the first few years of Parc, Bob Taylor wanted “only top people” (meaning “really top people”), and wanted to eliminate rivalries as much as possible.

So he set up a laborious process in which everyone had to interview a candidate, and then discuss, and take a vote that had to be unanimous for the candidate to make it. This took forever.

Of course we all bitched about this, but Taylor’s process really worked. Each new star was greeted with open armed warmth, and they immediately fit into the somewhat ad hoc non-managed research collaboration that Parc enjoyed.

Why is human communication being advanced by so many revolutionary and very popular technologies while human collaboration isn't? In order to communicate, one must first collaborate, right? Why so much help to "share" and (almost?) none to promise?

There is a lot of excellent content related to this question already on Quora.

It’s worth pondering two built-in human drives — competition and cooperation — and how they have played out in various cultures over history. We are social, and both want to (and generally need to) live in groups to survive. This requires degrees of cooperation. But then we generally compete at most levels.

I think the large claim that most advances in human civilization have come mostly from cooperation can be substantiated (and that various forms of competition are mostly not positive, and even when they seem to be, are of less worth than cooperation would have been).

And I think that we have to consider “human communication” to be much more than getting signals of one kind or another from human to human. It’s worth comparing the thought patterns of oral societies to those with writing and schooling: they are qualitatively different, and it is the latter that is associated with the building towards “civilization”.

Most recent media technologies — from the telephone on to radio, TV, etc — have facilitated oral modes of thought, including the oral structuring of thought (e.g. chat and Twitter). I.e. the facilitation is of means that require little to no learning, and thus require no major changes in skills/internal-structurings.

One of the large shifts in perspective about learning media in the 20th century is the realization that it is the difficulty of learning that requires large changes of skill/internal structurings, and if this is coupled with new powerful ways to represent ideas, large positive changes can be expected. These insights were famously associated with McLuhan, but many others contributed — Innis, Havelock, Ong, Eisenstein, Postman, etc.

Engelbart’s large contributions to the world included the idea that a reasonable “augmented human” is not created by just giving a powerful tool to a person.

Instead, he pointed out that (a) “augmentation” included most strongly the non-optional requirement for deep training and education to acquire deep skills in using the great new powers of technology and (b) that since most things are done by humans in groups, it is critical to also “augment the collective IQ of groups” with training/education to use powers with collaborative tools.

One of the underlying ideas here is that available powers plus the impulse to compete generally results in poor to bad uses of the powers. Simplistic image: cave people with atomic weapons (which is pretty much what we have today).

Another way to look at this issue is that deep training/education is part of what “developed cultures” require in many different areas — including science, engineering, etc. and in developed music and arts. They all require quite a bit of education and training to deal with. They turn into trivial and often ultimately dangerous pop culture expressions and weapons without the deep prep.

So — to gesture at the question — I would say that “human communication” with respect to important discourse has been mostly retrograded by poor untrained uses of modern technologies, and that our stronger competitive impulses are overwhelming needs for cooperation.

McLuhan pointed out in the early 60s that a networked world would produce a kind of “global village” (which he was not in favor of) where, among many results, would be that everyone would know what everyone else was doing. A disastrous consequence would be massive loss of identity, and fierce efforts — including war — to reclaim a feeling of it. One of the manifestations would be demands to participate but not coupled with interests to “cooperate”.

Who is the person to whom the invention of the computer is really credited to? There are different views on this particular issue; Alan Turing, John Von Neumann, Charles Babbage?

Goethe once wrote: “We should all share in the excitement of discovery, without vain attempts to claim priority”. I like this because it invites celebration of many ideas from many sources at many times. And it also still allows us to identify real “superheroes” whose efforts made significant qualitative leaps — these people have been critical for progress, and I think it is important to identify as many as we can.

It’s also important to keep in mind that “computation” created the paths for “computers”. You seem to be asking about the latter, but it’s the former that is more important (and which includes the latter). The former has to do with representations and manipulating them, including algorithms and notations — so coevolved with the development of maths — and the latter has to do with making machines that can help do this (and many analog means for computing can also be found in deep antiquity).

I’m guessing that by “computer” you also mean “stored program computer”, which is quite a subset, albeit an important one.

For modern times … It is possible — even likely — that Babbage and Ada had a good idea of the universal nature of being able to manipulate what they first thought of as representations for numbers.

I like to start with Charles Peirce in the latter 19th century, whose analysis of Boole’s book produced the universal logic operation of “NOR”, and who said that you should be able to make a computing machine from these.

During that period, there were many paths taken to represent and remember ideas via mechanisms (some of the earliest most interesting ones for the combination actions of pipe organs). An electronic (vacuum tube) digital memory — flip flop — was made by Eccles and Jordan in the early 20th century, and much experimentation was being done using relays for both calculation and remembering.

I think Turing’s paper in the 1930s was quite significant — not for the math — but for the essentially engineering approach he took, and especially the demonstration that very little mechanism plus general memory for “marks” could do everything. This strongly suggested how practical people should go about making stored program computers, and all that happened subsequently used Turing’s model. (What is called a “von Neumann machine” today is really a rendering of a universal Turing machine.)

However, the work of Konrad Zuse during WWII seems to have been mostly motivated by his own ideas — some of them more advanced than the rest of the world. And many secret, practical and large efforts were made at the Bletchley Park code breaking center during the same time.

So: two things to keep in mind (a) there was a lot of experimentation going on regardless of Turing and (later) von Neumann — for example Charles Stibitz at Bell Labs made an interactive complex number calculator from relays that could be used via dial telephones, and (b) that the first three stored program machines to work (not counting Zuse) were all British, and had different architectures than the Institute for Advanced Study computer proposed by von Neumann.

An interesting sidelight is that making the “simple general memory” turned out to be much more difficult than the “little mechanism” (early on using vacuum tubes). The latter were big, expensive, etc, but relatively easy to deal with. By contrast the various attempts at using mercury delay lines and persistent spots on CRTs were shaky and fraught with errors. It wasn’t until ca 1952–3, that Jay Forrester made coincident current core memory practical for the Whirlwind computer at MIT. This launched the computing industry.

So let’s just “share the excitement of discovery without vain attempts to claim priority”!

Why do some media amplify cognition more than others?

A comprehensive answer is beyond the size of a Quora response.

But we are “thinking” all the time, and part of our thinking is highly influenced by the information given to us by our senses: especially kinesthetics/touch, visual/configurative, symbols. The latter is usually via primary sense impressions via visual or auditory.

A central question to ponder is whether “thinking” can be improved by practice and by method — in other words, it is an activity that has things in common with physical learning e.g. sports (and this seems to be the case).

One kind of media that can be invented is that which tries to represent what we are used to getting from the world: for example, visual and auditory phenomena, perhaps in the form of paintings or movies or recordings. This is both of great convenience and also requires very little new learning to be done in order to make use of it.

The next central question is to ponder a difficult to learn medium, that requires quite a bit of new method and skill and exercise and repetition (even some psychic pain). What state will one’s brain/mind be in after fluency is gained?

Could the new mechanisms set up by learning to make the new medium fluently accessible also be new mechanisms for how thinking is done?

That seems to be the case for reading and writing. Many studies over the last century indicate that literate humans/societies think qualitatively differently than oral societies. The biggest differences — by far — are found with the combination of literacy and schooling.

There is a large literature about this including important contributions from McLuhan, Postman, Ong, Innis, Eisenstein, Cole, and many more.

I should add in here a little about “non-conformal” visuals — such as diagrams, graphs, schematics, etc. These have many of the properties of the “writing and reading of words”. They require a lot more learning and skill training, and what is built in the brain as fluency is attained, also seems to provide new ways for thinking about many ideas — and often new ideas themselves.

(One of many fascinating excursions with these kinds of visuals was done by Ron Thornton at Tufts U in the US, where it was found that (a) a student’s ability at reading and interpreting graphs predicted well their final grade in a first year physics course, and (b) that teaching students to get fluent in reading graphs vastly improved their overall performance (and grades) in the physics course.)

One of many more facets about this large area, is to ponder how humans respond to “wall to wall environments”: such as cities, countries, cultures, etc. and to media environments such as writings, television, social media, etc. In brief, we humans will try to accommodate ourselves to the norms of the environments we are embedded in. This is because we are wired genetically to respond to “cultures” in ways that allow us to live in them.

So, it’s worth pondering the kinds of re-norming that happen to human notions of “reality” from being embedded in 24/7 environments.

Simple answer to the question here is that all media amplify cognition but in different manners and directions, some of which increase “civilization” and some which are “anti-civilization”.

It’s worth learning which and making choices …

How can I incorporate a historic/literary approach to a topic about computer science?

I like to bring in important events in the history of computing when writing about contemporary issues. The main reason is that a very high percentage of computerists are quite ignorant of what has happened — especially since things got rolling in the 40s.

It has generally been the case that the reinventions and rediscoveries have been more like “reinventing the flat tire” than “reinventing the wheel”, and many important ideas from the past have not been rediscovered in any form, and perhaps, more unfortunate, many recent ideas have been worse than any from the past.

If you are looking for examples, you first have to delve into the history of the field. There are many starting places, but I think comparing what Engelbart wanted to do — and what he and his group did do — in the 60s to the almost non-existent notions for the WWW, will surely give you pause. Berners-Lee has apologized for not knowing about Engelbart, but this didn’t help the last almost 30 years (for example, in was in the OS of the Engelbart system that conferencing was not just sharing face to face and voice and screen images but to also provide access to all simultaneously for interacting with the shared content).

As to the “literary” part of the question, I’m not sure quite what is meant — but one part of it could be to encourage computerists to learn how to write well, and especially to include computer processes and interactions as part of the media they write with.

According to Alan Kay, LISP's metacircular interpreter serves as the "Maxwell's equations of software". What are then the Einstein's field equations of software?

I think I said “Maxwell’s Equations of programming” … and I think this metaphor works pretty well for the late 50s and early 60s when Lisp appeared. It showed “in one eyeful” a very powerful way to model a superset of the kind of programming that was being done then.

Metaphors only go so far, so bringing Einstein in might violate the spirit of the original observation. But one could imagine a similar deep succinct treatment of “process-oriented programming systems” that would capture the essence of “internetworked processes”.

What did people at Xerox PARC think of the work Apple was doing?

Here is a different way things could have gone …

The three largest intertwined visions of the computer future we were committed to, all burst into view in 1962: Ivan Sutherland’s Sketchpad, Doug Engelbart’s “Augmenting Human Intellect”, and Licklider’s “Man-machine Symbiosis” and “Intergalactic Network” via ARPA funding.

This was all the same community. Sutherland was funded by MIT/SAGE, Englebart’s was a proposal to the Air Force (later taken up by NASA, and then ARPA), and Licklider was asked in 1962 to use ARPA funds to do something about the future of interactive computing.

By the time Parc was set up in 1970 — and stocked with young researchers from the ARPA community — it was getting to be time for another pass at both interactive computing and the the “Intergalactic Net” — partly because ARPA funding had been curtailed via the politics of the Viet Nam war, and partly because an inflection point in ongoing VLSI was just about to happen.

The basic sentiment at Parc was that we wanted to do “Engelbart, Sutherland, and Licklider for everyone, including children”. A number of systems considerations, especially UI ones, suggested that the larger schemes thought about for the Internetworking of all should also be employed at the personal level (and that this choice would be better and better as Moore’s Law moved along).

The decision at Parc was to build “time machines” that would start appearing in 1973, but would be in the form of networked personal computers of the 80s ca 1986–88 or so. This would provide a platform to realistically solve many systems, apps, UIs, graphics, and other problems that would pay off handsomely down the road. About 2000 of these machines were eventually built.

The catch was that it would be very difficult to do either “Sutherland” or “Engelbart” with these machines. For example, Sketchpad was programmed in terms of “executable solvable requirements” (which is just what the future demanded) but it would take years and more powerful machines than were feasible in the 70s to do this.

And, one of many important features of Engelbart’s system was that any content could be completely shared real-time between any number of users, and all of them could touch and manipulate. This was built into the bottommost layer of the system and was a wonderful thing. Just how to do this gracefully on a distributed system was still an as yet to be invented scheme.

These considerations upset Engelbart to the point of causing a schism in his lab at SRI. One group stuck with him and implemented their next system on a mainframe time sharing system (as the previous one had been) and this gave them what they were used to for a few more years.

The other group — which could see that the problems had to be solved in a distributed way — mostly wound up at Parc. And the first versions of the distributed system were quite a bit weaker in many ways than the mainframe version, but stronger in a few important future directions.

“All other things being equal” (but they aren’t), Parc would eventually have produced a qualitative leap along the directions of Engelbart’s “Augmentation” (that’s what we wanted).

The silicon inflection point that allowed us to make our time machines also allowed simple inexpensive “8 bit micro” computers to touch the consumer and low end business markets. This started a wave in the late 70s that had pretty much nothing to do with “how computers should be used in society”. The people who made them and the people who wanted them had completely different motivations — and most especially completely different contexts and understandings about “computers for people”.

This created a “societal normal” about every part of computing that was both low, but also touched a very large unsophisticated market place.

A deep point is that the Parc and Engelbart approaches required more computing power than people were willing to pay for if they didn’t understand what it was for. The weakness of the 8-bit and early 16-bit micros (and lack of microcoding) took programming back about 15 years into rather low level approaches. New programmers starting learning C etc instead of the much higher level languages that were needed. When Intel et al started to make processors that were aimed at C rather than the high level languages, the game was pretty much over.

If you look at this from the “ARPA Dream” perspective, almost nothing from ARPA made it out except for things for which there was no competition. For example, the Ethernet was manifestly better than other attempts at local area networking. The Internet was manifestly better (and pretty much unique). Apple liked the UI work at Parc but — as Steve later said — missed pretty much everything else.

The Parc work at the end of the 70s lacked a lot of prime needed features that Engelbart had already done. It is doubtful that Apple would have picked these up if they had been there. Microsoft was much less idealistic than Apple and just as unsophisticated.

Different readers will have different reactions to this account.

For my part, after I left Parc in 1980 and first went to Atari and then to Apple, I was shocked at the naivete and general lack of knowledge and understanding of what computing was and could be all about — and especially the lack of any curiosity about possibly great ideas from the past.*

The present that happened was not about great ideas, but about generally unsophisticated individuals wanting to do things that made them feel better about themselves and that they could sell to other unsophisticates (that is not a basis that is going to give rise to great ideas).

Sic transit gloria mundi

---

note that Tim Berners-Lee did not know about Engelbart when he patched together what was to become the WWW. He has apologized for this since. But the fact that a computer person did not know prime history is as usual as it would be unusual for a physicist not to know about Newton and what he did. This is the difference between a “pop culture”, and a “developed culture”. And it has hurt computing terribly, almost to the death of its best ideas.

How did people at Xerox PARC navigate dealing with the corporate side of Xerox?

There were different personalities at Parc with differing relationships and needs with corporate; likewise, Xerox was a large rich corporation with many divisions and somewhat like “England before the Magna Carta”.

So some relationships were cordial, some were warlike, and quite a lot in between.

I wasn’t involved in most of this — I had a research group there and lab managers, especially Bob Taylor and Bert Sutherland, viewed a large part of their jobs to be finding ways to “let the researchers research”.

I had pleasant relations with Jack Goldman — a New Yorker and a half — the Chief Scientist of Xerox. I had many skirmishes with Horace Becker, the head lawyer, who was both a bulldog, and one without understanding of how science makes progress. I found out later that what I’d interpreted as “skepticism” from a Xerox planner was actually deep dislike bordering on hatred.

In other words, the usual human circus that happens when some of the participants are not very high minded. All this intensified as Xerox moved from the could do no wrong phase to starting to have difficulties in its markets. Money issues often bring out the worse in limited horizons.

How was software complexity changing as new technologies were introduced from machine code, through assembler to high-level languages and beyond?

A too short but pretty accurate answer is that HLL languages and architectures “almost always” lagged the demands of the software, and this resulted in more complexity than could have been the case.

Some of the exceptions are interesting. E.g. the B5000 HL machine with OS written in ESPOL, a HLL systems language. And: the Alto and Smalltalk system at Xerox Parc were design together similarly to provide a match to the software requirements. I thought Erlang was a good HL solution to the problems of the Ericsson switching system.

But most projects try to tackle newer larger problems with older, often obsolete languages and tools which add on a lot of gratuitous complexity

What is the idea that is better than semphores from John McCarthy in 60s? KEYNOTE 1: Alan Kay - Rethinking Design, Risk, and software watching this on YouTube 38:00

I’ve explained John’s idea of “pseudotime” in a number of Quora answers. Perhaps you or someone might locate these …

---

Thanks to Dane Clarke for these references:

Here’s a few that are fairly relevant: Alan Kay's answer to What's the best way of combining functional and OO programming in practice?

Alan Kay's answer to What will happen if one rewrites the entire Linux using Smalltalk?

Alan Kay's answer to Rich Hickey, creator of Clojure has said that object-oriented programming is unable to model “time” properly. What does "modeling time" actually mean?

Why hasn't Xerox PARC explored the stylus in the Alto era?

We did aim for the stylus at Parc, and this was from experience with the GRAIL system at RAND especially (and GRAIL featured dragging via a stylus).

My thesis project — The Flex Machine — also had a stylus/tablet that was less expensive than RAND’s and not nearly as good).

The sensor technologies in the early 70s that were good enough for a stylus were much too expensive to fit into the “at least 100 Altos” plan (e.g. a commercial good tablet was about $7000, and the Altos were supposed to be about $15K - $20K total).

A great solution — I’m pretty sure by Chuck Thacker — was to make a small mouse with high res shaft encoders (I think it was about 250/inch) and this could be used to draw with and also had all the advantages of a mouse.

These were in great favor in my group. Eventually they got displaced by lower res easier to clean larger mice: they were not so good for drawing, etc.

Parc also did a variety of touch sensors for displays — the Notetaker came with one, and there were experiments with Altos. These worked for coarse pointing and dragging but not for fine cursor control.

The mouse for the Mac was mostly about reliability — it was large and low res.

Why didn't Alan Kay show up when Adele Goldberg arranged the meeting with Carl Helmers in PARC?

First, I’ve heard of it. When was the meeting?

Why was there a mismatch between Xerox management and PARC?

Parc was set up in mid-1970 as an initiative of the Chief Scientist Jack Goldman and backed by CEO Peter McCullough.

Bob Taylor — former ARPA director, and funder of the ARPAnet — was hired ca Sept ’70 to set up the computing research lab. He was interested in this because Congress was in the process of putting the D on ARPA and changing the funding rules in ways that would hurt the edge of the art research that Taylor had previously backed.

Bob had dealt with companies before, and one of his stipulations was a legal agreement that Xerox couldn’t mess with the direction etc of research for 5 years. He then hired spectacularly well, and things got rolling.

A few years later XDS — a mainframe company that Xerox had bought in the 60s — failed and Xerox had to write off a billion dollars (a lot in those days). This cost McCullough his presidency — he remained COB — and a very poor choice was made to replace him. Meanwhile, there were critics of the “elitist” Parc researchers, etc.

The poor choice made things worse, and was eventually replaced. But in 1976, the desktop Japanese copiers — and a big surprise: the Kodak copier — hit the market and started to put big pressure on Xerox.

The trend here was less concern with the future and more about protecting their “heartland business” (as they termed it).

And the 5 year agreement was now up.

More and more pressure of various kinds was put on any part of R&D that wasn’t generating revenues.

Taylor sacrificed his career to hold the ship together, and most of the edge of the art research at Parc was done successfully, and at “living lab” scales much larger than demos.

Things were quite tough around the late 70s, and Taylor was eventually fired in 1982 — his entire lab resigned with him.

Is Alan Kay the most influential person in the CS and IT fields?

I couldn’t possibly be …

How is it that "coding" is a superior term to "conceptualize", if they mean essentially the same?

As far as my 60+ years of experience goes, the two terms don’t at all mean the same.

In the 50s and 60s especially, “coding” was the act of writing code, and was distinct from “programming” which included design, and often in terms of an abstract framework — such as a flowchart — rather than actual code. The results here would be considered more than a “conceptualization”.

Typically, one started off as a “coder”, essentially hand compiling designs into working programs, meanwhile gradually learning how to design.

Today, I think that most people use “coding” and “programming” interchangeably.

What is Alan Kay's opinion on the Akka actor library?

Hard to have an opinion without trying something major.

However, I like the idea of “actors” for a number of reasons (a) they are a reflection of the way I thought about “object-oriented message passing systems” in the 60s, and instigated Carl Hewitt’s work on them, (b) the actor research in the 70s went further than we did at Parc (our goals were to write whole systems for personal computing on the Alto, whereas MIT’s were to do research).

Both of these paths worked out (we did write the systems we wanted, and the actors work at MIT came up with valuable insights).

A simple-minded theory of mine is that every research project needs a genius like Dan Ingalls, who can successfully turn big ideas into whole systems. The Smalltalk work at Parc benefited from this, and most other interesting languages haven’t (e.g. the lack of whole systems for (say) Haskell, Clojure, Akka, etc).

How do I master the art of Smalltalk?

Let’s widen the question to: “How can I understand and use Smalltalk-like language/systems?”

A few basic ideas here are:

One way to model any system is via parts/wholes relationships where a part can be a whole system, and relationships are in terms of intercommunicating messages.

If the fundamental building block is a “whole virtual computer”, then each part at any level can always be represented and the interior will always be safe from external meddling.

(So far this gives us a software rendering of the Internet (which indeed is what Smalltalk was all about — it came from the same research community that invented the Internet, and its invention was contemporaneous).)

We can get something like a programming language by choosing the form of the messages that are sent and how they are received.

We can get as many parts as we need if we have the facility to “instantiate” any thing we make.

We can use these facilities to make our programming language, development system, the systems we want to create, etc., all down to the metal.

All of these facilities are constantly live. And it is possible to make the system out of itself in such a way that it can not only debug itself whilst running, but it can debug the debugger whilst both are running, and — like the Internet — it need never be stopped for any change or repair or addition.

---

Now here’s the tough part for most computer people at any time, and especially today — and that’s “we still have to do good design!”

A given Smalltalk — like Squeak or Pharo — will have a kernel like the above but will be furnished with a “library” of many prior design choices covering a wide range of needs and felt solutions to this and that. In today’s world “mastering” generally means learning all the choices (very similar to mastering an OS) so that as many of them as possible can be used going forward.

I think this is a good ploy for about 5–10 years of a feature, but is a disaster for very old features to have this power.

And — for example — it would be crazy to use the definitional capabilities of such a system to simply model “bad old ideas” from previous programming experience. For example, an enormous bug in many so-called OOP systems is to use this great power to model old not well scalable ideas like data structures and procedures.

It’s important to understand that the perspective here is “easily universal” e.g. if you feel that a “functional style” is good, it is extremely easy to model that with this “material”. There are no incompatibilities with programming styles here (this is a huge misunderstanding in the “normal” workaday world out there). This is a “meta” approach, and you do need to learn to “think meta” in order to make the best use from it (it is very hard to do good design in any area without allowing some “meta” to be part of the thinking process!)

To me: “mastering Smalltalk” means understanding what it is as a fundamental system building material, and seeing how to make the most expressively powerful thing from this that then can be used to create desired systems for many kinds of uses.

Smalltalk dates to the early 70s and its current general form to 1975–76. It ran on a personal computer with 128KBytes RAM (half of which was used for the display) and a 2MByte disk (somewhere between a floppy and a hard drive). It is of interest today because the kernel ideas were aimed at “the ages” and the sets of choices for the first library of extensions was carefully done (much of the credit for how well this turned out should go to Dan Ingalls).

As an example, here’s a tribute I did for Ted Nelson a few years ago using a revival of a 70s Smalltalk. Everything in the system including tools, development, the equivalent of an OS, and the equivalent of “apps”, was done in a total of 10,000 lines of code. This is why we put so much work into the design of Smalltalk — we felt that the VHLL abstractions we could make would allow a lot of functionality to compactly fit into the Alto — and that the Alto microcode (1K instructions) could run the kernel fast enough to make the result usable. Demo/talk starts at 2:15

https://www.youtube.com/watch?v=AnrlSqtpOkw&t=135s

Moore’s Law has had 30 doublings since then (and secondary storage even more). This is roughly a change in scale of a billion (some of which was used to create personal computers of much lower cost than the Xerox Alto). Also, Smalltalk was done before the Internet got done, and this represents cosmic scalings.

What this means is that you are much better off thinking things through again from scratch.

But let’s suppose that despite this you want a “modern Smalltalk-like system” to “master”.

To my eye, the kernel ideas have held up pretty well, but there are a few fundamental things that could be made nicer and more clear. For example, though Smalltalk can do any kind of messaging in a system, we didn’t put much effort into multi-machine massively parallel facilities. More effort is needed today.

The most expedient “bad” thing we did in Smalltalk was to use a relational scheme that heavily shared (borrowed from Lisp). This was coupled with “too many dependencies” etc.

Today, one would start with “marshalled objects” that can be transported over the entire Internet, and find ways to keep them as general as needed when reified back into life. This is a really interesting design problem, and a very important one.

One of the implications of scaling is that conventions that work locally start being of less use. For example, names are pretty local tokens, one needs search for global scales (and search by properties rather than tokens). Both Smalltalk and Lisp have this idea at a fundamental level, but not much has been done at the programmer levels to make it useful.

There are many more considerations that come with a “scaling of a billion plus the Internet”. I’ll end with just one of these. Programming traditionally has been about specifying “hows” and leaving the “whys, and the whats” in the mind of the programmer. Encapsulation allows more of a why and a what to be captured. The programming of the future has to start with representations — understandable by programs — of “why” and “what” — and has to be much more about this than any apparent “hows”.

This is a really important long known research problem for which almost nothing has been done or funded over the last 40+ years.

Are there any known jazz guitarists who prefer playing on nylon string guitars, and if so, why do they prefer it, the sound or the feel of the instrument?

Sure: e.g. Charlie Byrd and more.

I was a jazz guitarist who later learned classical, and much later started experimenting with jazz on classical. Basically, with my small hands, it was tough to do a lot of the jazz “reaches” on the classical instrument, both via action and scale size. Also, some of the differences in the actual physical frets can add more difficulty.

The reward is the reward of the classical guitar expression and its sound. Pretty much every physical property of a classical guitar is more difficult to deal with than with a well set up jazz instrument.

The sound and expression possibilities started looking for “crossover” instruments with more accommodating overall properties — of which there are now many. E.g. Godin makes nylon string instruments with better actions, cutaways, etc.

The best maker I found was Kirk Sand in Laguna Beach California, who makes superb crossovers of very high quality. I wound up getting two of his instruments.

http://www.sandguitars.com/products.htm

Eventually age got me and my left thumb joint wore out to “nothing” — some of it from the extra efforts no doubt. Now at age 81 I happily play the lute — classical — starting because of the much lower action pressure, but winding up loving the instrument and its music.

image

It happens that much of early classical music requires improvisation as part of the art, and this is especially true of the lute. In fact, 17th century music in general is rather “jazzy” in many ways, some just like 20th century jazz, and some that are parallels to 20th century jazz. So it’s been fun learning and adapting to this as well.

What is the big difference between a computer scientist and a software engineer?

Worth starting with “differences between Engineering and Science, in general”.

Engineering was able to predate Science by thousands of years because “things can be made/repaired/etc. in principled ways” without having to understand just why a working thing works. “Cookbooks” of things that are known to work, serve well (as indeed they still do in cooking).

Math started off very cookbooky also, and then was revolutionized by the pre-CE ideas of connected reasoning and proofs.

Science started off very cookbooky also, and the word originally meant “gathering of knowledge” (often via rather indirect ways). Both Engineering and Maths eventually had big influences on what Science thought it was about.

Two enormous leaps in the 17th century for what we call “Science” today were (a) Bacon’s call for “a new science” that should be centrally about dealing with “what’s wrong with our brains” via the invention of heuristics and technologies that can help us do better, and (b) the earliest really good scientific theories as (symbolic) working abstractions of what could be behind phenomena (Newton had predecessors, but he was so far off the charts that new charts were required). This started new qualitatively different paths, and a few more critical ones were added in the next few hundred years.

For example, the clarification between what is “in our heads” vs “what’s out there”. Einstein had a good saying for this: “As far as the laws of mathematics refer to reality, they are not certain; and as far as they are certain, they do not refer to reality.”

An important way to think about this is that “Science” is neither the math, nor the theories, nor the phenomena, but the process of negotiation between them.

One of the byproducts of this “largest thought revolution” was that Engineering — which eventually would be revolutionized by Science — lagged more and more by not taking up Science as a main facet. Because Engineering is mainly about making things, and many things can be made without real understanding, it took quite a while for the engineering fields to even see what they were missing.

Today, we can think of the larger picture as a kind of Venn diagram of Tinkering, Engineering, Mathematics, and Science. The isolated subjects have their own properties and appeal to different personality types, but in practice, most of the best practitioners in any of these are very well versed in the others. This is worth pondering deeply.

One of the largest unifiers and lingua francas is systems — a large enough set of perspectives to constitute a whole area of its own.

When something breaks, valuable information and insights are often a byproduct. Many things engineers do affect the safety of people, so the equivalent of the Hippocratic Oath in Engineering is: “the building must not topple, the bridge must not fall, the plane must not crash”.

Good engineers try to break things under controlled conditions so they won’t break when deployed. Good scientists are always trying to break their creations, but know that they never have the final word.

Note: there can be “sciences of the artificial”* for example, a “science of bridges”. Once engineers build a bridge, it exudes phenomena which can lead scientists to better notions/”theories” of bridges. A lovely thing about our own time is the extent to which engineering and science can co-evolve both technologies and theories.

So: there can be a “science of computation” to go along with the many engineering activities that are required to make artifacts (both hardware and software).

The first Turing Award winner — Alan Perlis — was a “large mind” — and pointed out that the best scientific conception of the new field was “the study of processes: all processes”, and that computers themselves were both part of the “process world” and also could serve as the “new math” for representing the processes and notions being studied.

To me — an old fogey from the main research community back then — this seems like the best way to proceed for now (and this perspective really made a difference in my research community’s inventions of much of the interactive computing, media, and networking technologies in use today).

It’s easy to see that this perspective was has not been taken up by most of the people doing computing today. The main activities are more at the tinkering and “making without real understanding” levels than they should be.

The vast gold rush which started in the 80s has produced something more like a pop culture** with pop music and pop values (for example, just compare the ancient engineering morality of safety with the modern computing tendency to try to make things “from matchsticks and houses of cards” despite that much deeper knowledge on many of these issues does exist).

Part of the pop process was that many of the older meanings from back then got redefined downwards to the point where they no longer mean the same things. Examples are: “computer science”, “software engineering”, “OOP”, “AI”, etc.

Just as pop cultures can’t deal with non-linear processes such as the pandemic and the climate, the coupling of Moore’s Law to oversimplified notions of computing has produced not just a mess, but one that becomes ever more dangerous.

---

see the great book by Herb Simon

** pop cultures can produce important things also — the big problem of our day is that commercialism has grown the pop cultures to be dominant rather than a small percentage for hobbies and harmless fun.

How can I implement an online microtonal adjustment algorithm to get more perfect harmonics with a method to avoid drift?

We did just that at Parc in the 70s, and not very completely. Today it could be done much better (and probably has been).

The idea is to try to imitate what singers and players of instruments that can shift pitch microtonally (bowed strings, winds, etc.) do when they are playing chords “in tune”. The system needs to take into account the key, the held notes, the kind of chord, and what comes next.

An example of a “problem” that has to be solved is: suppose you are in the key of C, the held note is a C, and the chord is an Ab7 (which would likely have a flatter version of the held C). The system should probably stay with the held C pitch and produce the Ab, Eb, and Gb a bit higher to get a nice major 3rd.

If the next chord is a G7, we can recover to the center pitch we are using for the key of C major.

But suppose the next chord is a Db major (so we just did a modulation). Now we are “drifting” if we relate this to the Ab7 (and we should, because Db major has an Ab note in it).

On the other hand, if we did the look ahead to see if there is a modulation or not, we can figure out schemes to stay in the “key center pitches”.

We can hear the wide 3rd in equal temperament (13 cents sharp) as a “pain”. Flatting this really helps matters. The slightly narrow 5th in equal temperament is less of a problem but we can usually widen it just a little here and there to get more consonant 5ths.

A main principle is that we have a hard time hearing “nanotones” (a few cents off), so we don’t have to get to perfect intervals to get much better audible results. This is what choirs, strings, and wind players do on the fly.

It is likely that a good solution today would not be done just by an algorithm, but would also have stored a lot of common case solutions to the adjustments, and would use these when possible.

My pipe organ has an unequal 17th century temperament to deal with the need back then for “Golden major chords” e.g. for Tunder and especially Buxtehude. So e.g. C major and D major are really in tune — and “Golden” — whereas there is no C# major (because there is no E# note — the F is too sharp). So C# F G# is really out of tune. I’ve been very surprised how many people I’ve demonstrated this to can’t hear it.

---

Just to get a bit more detailed here. The general problems had been realized since Pythagorean times. Simple put, notes of the same pitch from different instruments (human voice included) will have different timbres because the mix of harmonics (usually pretty much multiples of the fundamental frequency) will be different. For example, a stopped organ pipe will only have odd numbered harmonics in different mixes, a brass instrument usually will have a strong 5th harmonic, etc.

When you try to make a chord from 3 different pitches you will get interactions between the harmonics of all the notes. If you play C E G, the E fundamental will interact with the multiple of 5s harmonics of the C, and the G with the multiples of 3 harmonics of the C. This could sound strained. A solution is to derive the pitch of the E and the G from the harmonics of the C. This will give a “golden major chord”.

Sounds good, so get the rest of the scale likewise. For example, follow the 5ths — try this on a piano — and you will eventually get back to C. But note that, to do this, the notes on a piano cannot be harmonically aligned. Why? Because following the 5ths around harmonically is multiplying successively by 3 and octaves are multiples of 2: there’s no multiple of 3 that is going to give a multiple of 2, so harmonically, the “circle of 5ths” can’t close. This drove the Pythagoreans — and everyone since — batty. If you shrink the 5ths enough to allow them to close, then all the 3rds are annoyingly sharp (welcome to equal temperament!)

An earlier ploy was to look at schemes that allowed good sounding major chords, with especially good thirds. This can be done on a 12 pitch per octave keyboard for a few keys (like my pipe organ I mentioned above).

What if you want more good sounding keys? Well you could split the sharps: so in the place where there is an F# key, you could have the front be that and the back part of the key be Gb.

<to be continued>

https://www.youtube.com/watch?v=G583ZJ1Psdk

What are the mental models, biases, and fallacies that software developers should know?

Only 17?

Is there a more precise way to describe software than with programming languages?

I’m using the definition that “a programming language is something that can take a representation and turn it into something that a computer can carry out”. But we could also think of a language that can describe processes that don’t run on a computer (this was the case for “APL” for its first 6 or more years).

The form of the rest of your question is not clear enough for me — for example, it doesn’t say whether the description is about some software or whether it is supposed to be convertible into the software in some way.

Perhaps you could try reformulating the question …. ?

---

Added June 2022 …

I think several of the other answers are quite good enough.

But it’s interesting to think about what “precise description” might or could mean here. I used to be a mathematician, and part of the college education was about how many maths got much more precise starting in the 19th century (for example, calculus was pretty offhand about limits, convergence/divergence, continuity, etc.). Just what was meant by “non-rational”, countable and non-countable, etc. was murky.

In parallel, many ad hoc workable things in engineering, eventually got much more precise — for example, sound recording was in use for quite a few years before much better ways to understand and describe what was going on were invented.

A big fun thing about “digital” were the many ways to remove noise from noisy carriers, to get virtually 100% precision in creating markings. An expanding question is “what has to be done to get precise about what a marking stands for?” One pathway brings us back to mathematics, and whether what we can capture what we mean and intend with symbolic representations/definitions can stand enough on its own.

A simple minded answer with regard to computing is that a program in a programming language only has a precise meaning with reference to what happens when the program is executed (either relative to a machine or by a human: in possession of a “workably precise” description of the programming language). This can be made to work pragmatically enough …

What does Alan Kay think about computer algebra system (e.g. SymPy and SageMath) usage in mathematics education, before graduate school? Would widespread use of powered cognition assists like these rot younger students' brains as some people fear?

“Education” and “Training” have some overlap — along the lines of “understanding” and “doing” — but the distinctions are quite useful to keep in mind. In all cases we want something beneficial to change/grow in the mind of the learner.

This implies that we should always be asking the question “When should it be easy, and when should it be hard?”. The former will not induce much change, whereas carrying through with the latter will, perhaps at the cost of time and discomfort.

In the 70s when only a few children were using computers, we asked the group we were working with at Xerox Parc what they thought about what the computers were helping them do. They discussed this quite a bit and decided that it would not be a good thing to use computers to automate the center of your interests, because it would get in the way of learning.

Musicians are quite familiar with this, and many music teachers will encourage students to learn to sing, and a “singing instrument”, as well as the conveniences of instruments like the piano — which do a lot for you, but they also make it easy not to pay attention and learn important things about musical expression. Many of the difficult things in music are just the things one needs to learn in order to induce a more musical brain/mind.

Another example, is that it is well known that learning to fluently read written ideas produces large differences in cognition compared to that found in illiterate societies.

So, when it comes to mathematics, we have to use these principles to try to figure out “When it should be easy and when it should be hard”. We don’t want gratuitous difficulties, but we do need to figure out ones that will promote real growth in mathematical thinking — and if possible in useful mathematical knowledge and skills.

In all cases, we also have to take into account the differences between Kahneman’s “Fast thinking” and “Slow thinking”, especially that many of the ideas for which we have to use slow thinking to reach understanding, will also need training to develop fast skills in doing.

When we have a “helper” in the form of a person or a tool, we have to figure out when we just want to be practical and “get an answer” or when we need to understand something deeply before using a special tool. Most of our species chooses the former — we are nothing if not expedient — and this is very often not a good habit to get into.

Here the subject is mathematics, one of the few human inventions whose “knowledge” can be completely understood — and this is also one of its gifts: the clarity of understanding the relationships of abstractions without the murkiness of trying to use them in science to approximately describe the real world. If you don’t understand, then you can confuse the language with the phenomena, and miss the murkiness that is the very core of sanity.

Ameer Fazal asked in a comment below:

So, it is better to learn arithmetic, geometry, algebra, trigonometry, analytical geometry, set theory, calculus and so on with good quality books, working out problems with pen and paper. And with people and tools for various other reasons but primarily somewhat in an autodidactic fashion.

I think this question in the large has to do with what citizens in a democracy should be required to learn as children. I think children do have rights regarding what they can be coerced to do, but I also think e.g. that society has the right to try to teach all children to read and write. This is a clear conflict of sorts, but I’m on the side of mandating the teaching of reading and writing as part of preparation for adult citizenship.

I also don’t see how an adult can function as a voting citizen without a threshold level of science and maths, and this includes learning the habits of reasoning, questioning, understanding about “noise” (especially generated by our brains, etc.)

It’s interesting to try to work out what the threshold of these fluencies might include. I don’t think they are what schooling currently thinks about either maths or science — both would be quite different (and almost unrecognizable to most adults today).

In this light, it is worth pondering the future of “algebra” — both in its school use, and also the much more illuminating ideas of “Algebras”, especially “Linear Algebras”.

One starting place for a better “K-12 mathematics” would be to just look at the maths that provide the mapping/supporting representations for scientific models, with the entry being the world children experience. For example, Seymour Papert had many wonderful insights, including: that the experience of movement by a child traces out a version of Gauss’ “Differential Geometry Of Vectors” and this can be enhanced with toys, ways to talk about the movements, and ways to think “mathematically” about what is going on.

For example, consider making polygons by acting out “go a distance, turn an angle”. One soon discovers that to make any polygon (in fact, any closed figure) the “total turn” will be a full circle of 360 degrees. (This is a more powerful theorem than the internal angles of a triangle, and easier to prove.) From this it is easy to figure out polygons, etc.

This is one entry to thinking in terms of vectors, and it works powerfully in 1, 2, 3, 4 etc dimensions. The genius of Papert produced here something that is simultaneously in the sensual world of the child, that is one of main forms of mathematics used for science, and something that is in the reasoning wheelhouse of children for really doing math — as Papert noted “Teaching children to be mathematicians as opposed to teaching them about math”.

This is the same distinction between helping a child learn to actually play a musical instrument vs putting them in a “music appreciation” class. We should be able to see that — for many really important subjects — “standard schooling” is way off (and to the point that it actually hurts the children: i.e. it’s quite a bit worse than “neutral”).

What was the world’s first high level programming language in 1957?

Is this really two questions?

The simple answer is FORTRAN: it was by far the most developed, polished, and optimized HLL as of 1957.

At the next — less simple — level are quite a few things in the works.

Besides what Grace Hopper was doing — e.g. with “FlowMatic” — “IAL” (what is now known as “Algol 58” was in the works (and it was more “sweetly conceived” than FORTRAN).

On Whirlwind at MIT, there was already a “quite HLL” interactive language that was pretty darn nice — called “George” — by Hal Laning and Niel Zierler (one history said it was implemented in 1952).

Laning and Zierler system - Wikipedia

Sanjay Vyas asked: There was also Plankalkül?

Reply: I thought about that — and I’m a fan of Zuse — but decided to restrict to languages that were actually implemented on either side of 1957. Here’s a sentence from Wikipedia:

“In a 1957 lecture, Zuse expressed his hope that Plankalkül, "after some time as a Sleeping Beauty, will yet come to life” Plankalkül - Wikipedia

A strong argument in Sanjay’s — and Zuse’s — favor is that Iverson’s “A Programming Language” existed for years — and a lot of programs were written in it — before it was implemented. Most of us date this language to 1962 (the book) rather than to the late 60s when “APL360” was implemented.

If we follow that line of reasoning, then we have to look at Ada’s program for Bernoulli numbers, which was also just on paper. She certainly had a formal organization for this program.

Can atoms of imaginary matter be created on a computer only with software imaginary graphics?

I typed “simulated atoms” into YouTube and got many different videos showing various levels of physical simulations of atoms rendered using “software graphics”.

Several of these show actual constructed atoms from elementary particles. My favorite was the one that showed a proton’s quarks interacting with electrons. That was one visualization I hadn’t seen or even deeply thought about before …

There are also some very nice crystal and biological renderings using the basic “chemical physics” of atoms.

I don’t quite understand your phrase “using only software graphics”.

(a) at some point what the software does has to move through hardware for us to see an image,

(b) the physical simulations are mathematical — like the viewing simulations — but using somewhat different math and physical modeling (but all can be done in software — and all can be done in either a regular CPU or in a GPU).

What do you think of RMOX operating system from U Kent in England? I am amazed it is 32 bit only and not 64 bit (maybe secretly used by Englands government) https://www.cs.kent.ac.uk/projects/ofa/kroc/rmox-cpa.pdf

This is a “CSP” influenced design. I don’t like to criticize unless I’m willing to do it in substantiated detail, so I’ll just say here that there are other styles for dealing with concurrent modules that I like more.

I want to understand and learn Lisp. I saw one of your YouTube videos where you said few understand it even today. What is the best way to learn Lisp and the creation of really good software that like the internet is reliable?

Lisp presents itself as “a kernel language with a library” and as a “metalanguage” and “metasystem”. It’s the latter that I think few in computing — in terms of percentages — have taken the trouble to learn and ponder deeply.

There are two main aspects to “meta” in our context here.

The first is how to describe the important parts of a language in a small and direct enough way to provide illumination. McCarthy’s original definition of Lisp in itself does this extremely well.

The second important aspect of “meta” is “reflection” — which means that programs in the running language can examine the mechanisms that are making them run. Lisp is very amenable to this, and some Lisps have been implemented to be reflective in considerable depth (note, that (1.) doesn’t imply or require (2.).

In the context here (1.) will exhibit — “simply” — how to make a mechanism that will run any program in the language and will produce the same results. A Lisp “meta-program” for this is very small and clear.

“Reflection” here can be approached by thinking about what kinds of live access are needed to allow a really good debugger to be written in the language so it can be a live process in the language. (Note that JS — which has some Lisp-like features — is quite poor with respect to complete reflection.)

A next stage of reflective thinking would be to ponder what it means to extend the core meanings and mechanisms of the language at the same level as the existing kernel. Some excellent thinking about this can be found in the next four paragraphs.

i. The basic idea is that you don’t need much to make a “universal computing material” and that can be used very compactly to represent itself in itself. This produces something that is both extremely powerful in its reach, and quite small enough to think about mathematically, as well as for writing programs.

ii. I liked this basic approach very much, and we used it for Smalltalk “down to the metal” at Xerox Parc (we didn’t base Smalltalk on S-Expressions, etc. — the treatment was in direct terms of “objects”). It was quite good with regard to both (1.) and (2.) above.

iii. The “Art Of The Metaobject Protocol” does even better in some respects for “Lisp with Objects” — the early examples in the book are quite illuminating.

iv. I also recommend Dave Fisher’s 1970 CMU thesis “A Control Definition Language” for an advanced McCarthy-like meta and reflective approach to extensible languages.

An historical approach to learning “deep Lisp” is to get the “Lisp 1.5 Programmer’s Manual” and work through the first 13 pages under the direct tutelage of John McCarthy. Page 13 takes most people a few deeply concentrated hours “on a Sunday afternoon”.

Here are a few books since “The Lisp 1.5 Manual” that also include good travels into “Lisp-meta”.

A classic — and now (also) historical approach — for “Lisp” (now in terms of “Scheme”) is to get the Abelson and Sussman book “The Structure and Interpretation of Computer Programs”, which has served as the textbook for learning this world of thought. This is longer and gentler (most computer people pronounce it “difficult”).

There’s a book “The Little Lisper” that many like.

This book is quite good to take you though this kind of thinking:

Essentials of Programming Languages

The last part of the question is a bit trickier, and I don’t think I know of a book that gives a good account of what is known (for Lisp or any other language). There are several parts to this (a) how to make an “eternally alive” system that basically can’t crash and has enough meta to participate in its own improvement (b) how to write really good software.

“(a)” has much to do with “confinement”, “fail-fast”, and ability to always have processes that can look at other processes. The classic Smalltalks, and also Squeak Smalltalk — whose chief architect and builder was Dan Ingalls — can be studied. It is quite possible to go a bit further in a system like Squeak and even have it be able to replace its own kernel without having to stop it.

“Meta” has gotten richer since the early days of Lisp. For example, Dave Fisher’s CMU thesis on extensible control structures. I’ve mentioned Smalltalk. I like the motivating chapters of the “The Art of the Metaobject Protocol” a lot. I like the way the LOCUS OS was organized. LINDA has a lot to ponder. And, especially, there is Erlang — especially its systems principles — to look at. I think the “simulation of time” ideas of McCarthy are key.

Etc.

“(b)” has a subjective side (“good” software needs “good” people), and also a number of principles. One is to separate meanings from optimizations in such a way that (i) the meanings/intentions are debuggable and are readily readable and understandable by the humans involved, (ii) the system will run as desired with the optimizations turned off (a supercomputer may be required). This is a computing version of the CAD<->SIM<->FAB process in established Engineering (with a Big E) disciplines.

Are business interests and principled engineering often at odds?

Take a look at “Challenger” and Thiokol, and the 737 Max and Boeing — as starters …

What would "software CAD/SIM" look like?

Let’s try as close an analogy to modern Engineering (with a Big “E”) as we can. One of the processes is called CAD<->SIM<->FAB(rication), and the idea is to solve as many design problems as possible in the CAD<->SIM part.

CAD is a set of tools for making progressively better models of what is in mind that can be put through their dynamic (and otherwise relational) paces via SIMulation. A key principle here is to capture the “meaning” of the model and debug it. In many engineering areas, the SIM part is computing intensive, and may require a supercomputer to pull off.

The “FAB” part of this scheme involves much of the optimizations that are needed to reify the model in the world (and usually to manufacture and distribute many copies).

The computing world spends quite a bit of its time and energy working directly in the FAB area. Once optimizations are successful — often quite a bit of work — it is then easy to make as many copies as desired.

If we try to stay with this framework — CAD<->SIM would be the tools to make a working model of our idea, which perhaps will require a supercomputer for real-time response. The model should capture our goals and requirements in as simple and meaningful way as possible.

Some of the model code could be in the form of “constrained goals” (like Sketchpad or Thinglab or Cassowary), and some could be in the form of simple unoptimized programs.

A simple example might be a component — server — that semantically does sorting. The model code would be very simple (whether as a constraint or a simple program). The SIMulation has to be powerful in order to be able to work on realistic sized problems. Note that the FAB part can be developed incrementally inside the component/server and could include dozens of different sorting algorithms and representations depending on the requirements (for example the system might want to use B-trees for some services). The model code can be run in parallel to check each optimization. A well designed system would be able to run with all optimizations turned off (this would be just the CAD/SIM part).

Reality is more complex, but the principle of separating meaning from optimization is powerful, and the kind of system I’m describing is set up to keep them separate.

We got to try some of these ideas at Parc — and found many of the principles in the analogy to work well.

In any case, if you are going to have “requirements”, I think that they have to be simple, runnable and debugable in order to be of much use — and this leads directly to the need for powerful models and SIM, in part because so much software today has real-time requirements as well as relational ones.

Historically, it’s worth noting that the early drive for higher level languages was to program in terms of models as much as possible — they were also called “Problem Oriented Languages” — and to have the compiler do the optimizations. This is hard to do for many problems, even if you can give the compilers help and hints. It is also hard psychologically for programmers not to sneak in optimizations intertwined with the meanings in the code (this soon pretty much disappears the meanings (the whats) in favor of the hows).

The idea of separating meanings from optimizations (and to have mechanisms in the development languages for doing so) helps with the practical business of actually writing software.

Has Lisp ever been implemented iteratively?

I’m not sure I understand this question clearly.

But the first implementation by Steve Russell — and many since — was done by writing a loop that manipulated structures in memory to create the equivalent of a recursive eval-apply. Take a look at the latter part of the “Lisp 1.5 Manual”, which discusses the scheme used with the 7090 at MIT (a machine which didn’t have a stack mechanism (most didn’t back then)).

The entire machine code of the classic graceful Peter Deutsch Lisp implementation on the BBN PDP-1 is given in the book “The Programming Language Lisp” (Bobrow and Berkeley ed.). The PDP-1 didn’t have a built-in stack, etc.

What language was ALGOL 58 written in?

“BALGOL” (Burroughs Algol) was a version of Algol 58 that was implemented on the Burroughs 205 and 220 computers in pretty much the same machine code — the two machines were quite similar (good portions of this by Don Knuth).

How does one get as close as possible to mastering software engineering?

“Engineering” is making something in a principled way — so there is definitely a “software engineering” of a sort.

The idea was articulated in the 1960s as a aspiration because the “principled ways” known in computing did not compare strongly with where the established engineering disciplines — civil, mechanical, aeronautical, electrical, etc — had gotten to.

Today, many computerists will say “software engineering” is defined by just what it is that they do (ditto with “computer science”). By removing the aspiration, they also have removed much of the motivation and ability to see, criticize and improve. For these people, “mastering software engineering means getting up to speed with current practice”.

As an old fogey, I prefer the earlier, sterner and more enlightened stance of aspiration and like to compare and think about how making software could be brought more to parity with established engineering.

One property of the established fields — “Engineerings” (with capital “E”) itself — is that they were completely revolutionized by the invention of science. The takeup of science into engineering took quite a while — a lot was slow and grudging over 150 years — but the big “E” is now an amalgam of traditional tinkering, making with principles, mathematical models and modern science. The latter two, especially the latter, have vastly changed what “making with principles means” and “the principles and tools” used. The advent of the computer — especially powerful interactive graphical computing — came along at just the right time to supercharge this change with powerful CAD tools, SIMulators of the CAD models, and increasingly FABrication driven directly from the designs vetted by simulation.

It’s worth noting the deep irony that the new computer tools for the engineering disciplines are almost always more comprehensive than the ones found in use by computer people for writing the programs! (There are a lot of “black screen simulated card-deck-glass-teletype” screens in use, in gross contrast with e.g. how something in EE or ME is designed and made today.)

Going deeper, most software people are just trying to do FAB, and most of the tools are FAB tools — there is very little CAD and even less SIM in “software engineering”. To my old eyes, this doesn’t look or feel like real engineering process.

From another perspective, I don’t see a lot of “math and science” in the current “software engineering” process. I will leave expansion of this aside here with just “it doesn’t help to have ‘computer science’ these days not really take the term ‘science’ seriously” — the whole phrase is perhaps more of a designer jeans label.

Now let me pick on the first term in “software engineering”. Many software systems today are “legacy” in one form or another, and in any case are characterized by “rigidity” with respect to change (this is the opposite of “soft”) and by “fragility” with respect to safety and integrity (this is the opposite of what “E”gineering is all about). In fact these are backwards. What we want from “soft”ware (and from understanding “soft”ware) is “extreme flexibility” over the lifetime of an artifact (“soft” is why we make programmable computers) with “extreme safety and integrity”.

This combination has possibilities of being radically extended for “Engineering of Software”, and I think our main job as “aspiring engineers” is to move software towards “extremely safe extreme flexibility”.

I think this is a good starting place for rethinking all of this.

I think “software engineering” — like the existing developed fields — should be actively aspiring to find “What Is Actually Needed” (WIAN) for both its projects and for the field itself. “Mastery” is then the process of continually moving closer to the ideals of the field — it is not a location.

What are the limitations discovered in the early programming language that triggered the need for OOP?

The earliest programming was in the forms of the earliest computers: to find resources in memory — usually numbers, or numbers standing for something (like a text character) — and doing something with them: often changing them or making something and putting the results in memory. Control was done by simple instructions that could test and compare, and branch to one part of code or another: often to a part of code that had already been done to create a loop. An instruction not in the hardware could be simulated if there was a way to branch and capture where the branch originated, thus producing the idea of “subroutine” (first used in full glory with a “library” on arguably the first working programmable computer, the EDSAC by Maurice Wilkes at Cambridge, late 40s).

Beginning programming was and is most often taught in this style, and it has been noted that the first programming language and style one learns tends to manifest most deeply throughout the rest of a career. Not a lot has changed 70 years later, partly because many languages started off with this style in mind, and thus the new languages were attempts to make this style more convenient to use (Lisp and APL were different early interesting exceptions).

Another way to look at this is to note that (1) the degrees of freedom of a computer, and of the possible problems to be solved, coupled with the limitations of the human mind, means that anticipating all the tools needed will be essentially impossible. This means that how to define new things becomes more and more important, and can start to dominate the “do this, do that” style.

Along with this (2) soon came *systems — dynamic relationships “larger” than simple programs*. Programs are simple systems, but the idea doesn’t scale up very well to deal with qualitatively new properties that arise. Historically, this never quite subsumed “programming” (and the teaching of “programming”). It gave rise to a different group of computerists and did not affect “ordinary programming” very much.

I think it is fair to say today that the majority of programmers reflect this history: most do not regard definition as a central part of their job, and most do not exhibit “systems consciousness” in their designs and results.

I think quite a bit of this has to do with the ways programming is taught today (more about this gets even more off topic).

Looking at this, the earliest real “computer scientists” could see that e.g. subroutines were an extension mechanism, but they were weak — for example, to make a new kind of “data structure” was fragile and could not be made a real extension to the language. This led to a search for “extensible languages”.

Other computer scientists could see that “data structures” were not a great idea e.g. sending a data structure somewhere required the receiving programmer to know many details, and the structure itself might not fit well on a different kind of computer. A vanilla data structure was vulnerable to having a field changed by an assignment statement “somewhere” in the code by “somebody”. And so forth.

Most of the programmers were used to the idea of commanding changes to “data”, and so some of the fixes were mechanisms that allowed data structures to be invented and defined: one of the major styles today is “abstract data structures”.

Along with all this were several ideas for dealing with simple smashing of variables (and the essential “variable” that is a data field). This was scattershot and reinvented in different ways. The most prominent way in strong use today is for very large structures: “data bases” that are controlled by the intermediaries of “atomic transactions” and “versioning”, which effective wrap the state with many procedures to ensure that a valid history is kept and relationships between parts of the data base are not violated. Eventually, it was realized that “data” didn’t capture all the important questions that could be asked — for example: “date of birth” could be “data”, but “age of” had to be computed on the fly. This was originally done externally, for some data bases, procedures could be included. (This required a “data base” to eventually be able to do what a whole computer could do — maybe “data” is not the operative idea here, but instead “dynamic relationships relative to time” works better. If so, then the current implementations of “data bases” are poor.

In computer terms, modern data bases” are subsets of the idea of a “server”.

Another line of thought — which goes back before there were workable computers — is that (3) certain easy enough to make computers can simulate any kind of mechanism/computer that can be thought of. This partly led to several landmark early systems such as Sketchpad, and the language SImula.

If you take in the above, and carry to the extreme, its worth noting that only one abstract idea is needed to make anything and everything else: the notion of “computer” itself. Every other kind of thing can — by definition — be represented in terms of a virtual computer. These entities (I’m sorry I used the term “objects” for them) are used like servers, and mimic the behaviors of (literally) any kind) that are desired.

A key point here is that just having practical means for creating objects doesn’t indicate what should be simulated with them. And here is where the actual history has been and continues to be unfortunate. The most use of the idea — still today — has been to simulate old familiar ideas such as procedures and data structures complete with imperative destructive commands to bash state. This again goes back partly to the way programming is still taught, and to the rather high percentage of programmers today who are uncomfortable with design and “meta”.

For example, since “an object” is a completely encapsulated virtual computer, it could be set up to act like a transactional versioned date-base. Or something much better and more useful than that.

Note that most interesting representations of things do “change over time” so something has to be done to deal with this problem. So-called “Functional Programming” has to add features — e.g. “monads” — to allow state to advance “in a more functional way”. This might not be the nicest way to deal with this problem, but something does have to be done.

And note that if you have gotten religious about “FP”, then it is really easy to make a pure FP system and language by using the universal definitional properties of “real objects” (being able to define what you want is the deep main idea!) But before you do, it will be good to ponder in larger terms.

As Bob Barton once remarked “Good ideas don’t often scale” — and neither do most simple programming paradigms. This means that another of the new things that can be built with “objects” — but have to be invented first — are less fragile ways to organize systems.

Along the Barton “qualitative changes” line of hints, one could start contemplating a kind of “request for goals” kind of organization where the semantics of the worlds being dealt with are more richly human and the main center of discourse is about the “whats that are needed” rather than the “hows” that the system ultimately uses.

This was one of the impulses behind some of the HLLs in the 50s and 60s, but the field gave up too early. The original idea behind a “compiler” was to take a “what” and do the work necessary to find and synthesize the “hows” to accomplish the “what”. 60 years ago the “whats” were limited enough to allow compilers to find the “hows”. But the field decided to sit on these and not uplift the “whats” that would require the compilers to do much more work and use more knowledge to synthesize the “hows”. This is another way to miss out on the changes of scaling.

In a “real object language” — with “universal objects” — it should be possible to define new ways to program and define and design any new ideas in computing — I think this is necessary, and that it has to be done “as a language” in order to be graceful enough to be learnable and usable.

Historically and psychologically, tools have had a somewhat separate status from what is made with tools (and the people who make tools, and make tools to help make tools, etc. are also somewhat separate from the average maker). But a computer is always also a tool making shop and factory, you don’t have to go to the hardware story to buy a hammer etc. This requires a change in mindset in order to really do computing.

At Xerox Parc in the 70s, we made a “real object language” to walk both sides of the street (a) we wanted to invent and make a whole graphical personal computing system, and (b) we wanted to be able to easily remake the tools we used for this as we learned more. I.e. we wanted to “co-evolve” our ignorance in both areas to reflect our increased understanding. We were motivated both by “beauty” and that we had to go super high level in order to fit our big ideas into the tiny Alto.

This process resulted in five languages, one every two years (thanks to the amazing Dan Ingalls and his colleagues), with one deep qualitative change between the 2nd and 3rd languages. That these languages could be useful “right away” was due to the way they were made (and partly because the languages contained considerable facilities for improving and changing themselves). To make progress on the personal computing parts, the constructs made in the languages had to be extremely high level so that the system could be rewritten and reformulated every few years.

The 5th version of this process was released to the public in the 80s, and to our enormous surprised was not qualitatively improved again, despite that it included the tools and the reflective capabilities to do this. The general programmers used the language as though it came “tight” from a vendor and chose not to delve into even higher level semantics that could help the new problems with the new scalings brought by Moore’s Law. (This was critical because there were somethings we didn’t do at Parc because of the scalings that needed to be done to deal with “10 years later” scalings, etc.)

To answer the current question after the “long wind” here: there are usually enough things “not right enough” in computing to need new inventions to help. Most people try to patch their favorite ways of doing things. A few will try to raise the outlook and come up with new ways to look at things. The deep “object” idea, being one of “universal definition” can be used for both purposes. Using it for the former tends to just put off real trouble a little bit. I think programming is in real trouble, and needs another round of deep rethinking and reinventing. Good results from this will be relatively easy to model using “real objects”.

Did Xerox PARC have time constraints?

I was not aware of any during the period I was there, from late ’70 to ‘81.

However, we did often put them on our own projects — and for various reasons. For example, when we decided to make our own PDP-10 when corporate nixed buying one and wanted Parc to get a Sigma-7, the Parc feeling was that this would be worth it if the project was up and running in less than a year.

For the Alto (which started off as a covert project) Chuck Thacker had a bet with a Xerox exec that such a thing could be done in 3 months (he won).

This kind of voluntary internal time pressure for practical building of things did not get in the way of the freedom to spend arbitrary lengths of time “problem-finding” and mulling.

The tensions between Parc and corporate was less pleasant than our previous experiences in the ARPA community with the relationships between the ARPA projects and ARPA “management” were more informal.

But the situation at Parc was overwhelmingly set up for positive processes nontheless.

Did the Xerox PARC children had any problems context-switching between traditional math precedence rules and Smalltalk alternative left-to-right evaluation order? Would Mathematics change significantly if left-to-right evaluation were widely adopted?

We made the decision not to have precedence for the very same reason APL did: if you have lots of operators, just remembering precedence starts to become a burden. The kids had no trouble dealing with the (easier) left to right “consumption” idea, and I never heard any complaints about the conflict with precedence in school. These were 12 and 13 year olds.

How can a beginner develop thinking (computational and algorithmic) in programming?

This question was asked by “Leonardo da Vinci”, and I wouldn’t like to deprive Leonardo of the pleasure of finding this out for himself.

---

Now that we’ve had some fun with this, here are a few contextual perspectives to get you started …

“computational” and “algorithmic” have now become buzzwords, so a good starting place is to ignore these, and start with the term “thinking” (you could take a look at my Quora answer for my take on this term and how to develop it). Also, several of us wrote an ACM article with the title “Computational Thinking Is Just Good Thinking” (and the title makes the main point). One of many important ways to approach this is to realize that a threshold for modern thinking is to be able to know if you know enough to (really) critique the ideas you are dealing with, and to know how to approach learning what you need to know to critique. I daresay that most people in computing today cannot adequately critique what they and the field are doing, and this means that it turns into something like creeds, then religions, and finally into “cargo-cults”.

Historically, the hardware of computers was tiny, and what they were used for was very limited. So learning how to come up with and program algorithms in machine code was the prime thing to learn. Don Knuth’s immense tome “The Art Of Programming” was just this: it used machine code, and as he said, was partially a “cookbook”. But designing and making systems is the larger idea, and quite a bit of it is not “algorithmic” in nature. I suggest that you get some books about systems and start learning to think in terms of systems. This is a new kind of thinking that co-evolved with the new thinking of modern science. Learning both of these will help greatly.

I would put quite a bit of effort into learning about and dealing with “noise” of all kinds. This is a third kind of modern thinking, to go along with science and systems. In the large, “noise” is all the stuff at every level that you don’t want to have, and especially that interferes with what you do want. The two key ideas are (a) everything in the real world has noise — including both our biological and psychological brains — and (b) an astounding great thing is that if we can determine the maximum percentage of noise in a system, then we can use the very system to make processes that are free of noise. The “(b)” part was articulated by Claude Shannon, and is one of the greatest creations of the 20th century, not just pragmatically, but philosophically. It started off being used in communications and computing hardware, but it is quite general, and is found in nature also, especially molecular biology. It is used in every modern engineering system in one way or another, except … that very few computer people are willing to do the work to apply the ideas to software — so this is “a biggie”.

Scaling is another deep problem for the human mind, and for systems in general. We have to learn how to design for both growth and large size, and this means that approaches which work with small problems — and with most examples in programming courses — don’t scale well at all. In many cases you can’t use “elementary computing” to do real computing. How to design systems that have thousands of requirements and constraints is another large branch of knowledge. Still, one of the best general discussions of this is Christopher Alexander’s first book “Notes On A Synthesis Of Form”.

Tony Hoare once remarked that “Debugging is harder than programming, so don’t use all your cleverness to write the program!” This is one of the main bugs in general computing today. You are much better off designing for failure and change rather than success and to the original objectives. Most of the cost in software systems (more than 85%) is incurred after they are successfully “finished” and deployed. Real computing has to deal with real “ality” and being willing to pay much more up front to save much more over the life of the software. One way to think about this is that general computing today has forgotten the meaning of “soft” in “software” and why we do software at all (it is not to make intractable “hard” masses of code that cannot be reformulated).

A bit redundant — but perhaps useful to say it this way — one of the things that was most helpful to the powerful ARPA/Parc computing community in the 60s and 70s, was that most of its members had deep training and experience with difficult already established fields which required much discipline to acquire fluency — such as the physical sciences, mathematics, and engineering. This helped tremendously in trying to understand what each of these terms, processes, and perspectives might mean for the new, wiggly, and prone to BS field of computing. I.e. what would happen if we took “science”, “mathematics”, and “engineering” seriously within computing? Being able to do this is one of the main pillars of “thinking” and “thinking within computing”.

Of course, there are more than 6 perspectives needed to become a decent computerist, but these are 6 important and non-ignorable ones that will start you off.

What are some examples that demonstrate the usefulness of homoiconicity in programming?

I don’t think it is that useful per se (the most important aspects of a programming language are to have both (a) the most powerful semantic frameworks for expressing meaning, and (b) to be as learnable and readable and “gistable” as possible for human users.

The most famous example of “homoiconicity” is Lisp, but the language was not originally so. McCarthy thought and wrote in terms of what he called the m-language (which was basically normal math function syntax with square brackets, and arguably more readable — he thought it was). To do an interpreter of Lisp in itself, he wanted to have a representation for the programs in terms of lists, and chose what has become known as “Cambridge Polish”, a prefix notation with parens to delimit the sequence of parameters — McCarthy called these “S-expressions”.

When Steve Russell did a working interpreter for this on the 709 (or 90) he wrote a “read” routine — which translated text characters to lists — but “didn’t get around” to writing a simple parser for m-expressions into s-expressions. Thus the first — and most subsequent — Lisps were entirely in terms of what would be equivalent to ASTs.

The hackers were happy to work in a kind of “machine code”, but there were several efforts to make a more readable Lisp. The official “Lisp 2” effort never quite made it. But there were several really good versions of “Algolic Lisps” over the years: two of them were at McCarthy’s Stanford AI project: M-Lisp by David Canfield Smith and Horace Enea, and Reduce by Tony Hearn.

Note that even in the homoiconic s-expression Lisps, you have to use functions to get at the “functor position” and the parameters. These happen to be vanilla Lisp functions such as CAR and CDR — but you could just as easily write a function called “functor” that when applied to an expression, will find and deliver that part of it. Similarly for the arguments. And now you can do everything you used to be able to do, but now in a much more readable language.

So, the key to this kind of power is not homoiconicity, but the “meta” of having expressions be 1st class objects in your language.

---

A really interesting set of tradeoffs exists for the processes of learning deep ideas about programming and programming languages. And these tie in with some of McCarthy’s earliest ideas ca late 50s about “programming and computing in the future”. For example, he says in the very early “A Basis For A Mathematical Theory Of Computation”:

Programs that learn to modify their own behaviors require a way of representing algorithms so that interesting properties and interesting transformations of algorithms are simply represented.

So right off the bat he wants programs to be able to look at and “understand and reason” about programs as much as possible.

The overlap between beginners and these programs is that the fewer concepts and relationships the better. This is where Lisp and its s-expression descendants provide the interesting tradeoffs between real simplicity vs a new notation that must become fluent. The Scheme (and later with the addition of Racket) people have provided plenty of evidence that this pathway works. Doing this with more normally readable expressions plus access to the ASTs — at least in universities — seems sparse to non-existent in my admittedly limited horizons.

So it’s possible that the homoiconic approach for meta is a kind of cognitive sweet spot. On the other hand (I think) you get something much nicer in many ways to go with the way Smalltalk dealt with reflection and meta. It is less of a hack etc. and requires a bit more learning before diving in. But this approach scales more gracefully, and starts with much more protection.

What caused the ADA to be created?

The Wikipedia article is a good place to start:

https://en.wikipedia.org/wiki/Ada_(programming_language)

This was a big project with many people — quite a few of whom were really good — contributing many opinions and ideas — quite a few of which were really good.

And then there were many “other people” with “other kinds of ideas”.

Much can be understood about what happened by reading this webpage, which gives the initial “Strawman” requirements for what became ADA. (The progression of “mans” over the length of the project was: straw, wood, tin, iron, steel …)

I got to watch this happen in the 70s, and various “language people” from Xerox Parc were also asked to comment and contribute (I went to a few meetings, and others went to more of them).

I liked “kernel extensible reflective, etc.” languages, and ADA was never aimed in this direction, so my personal interest was not high — however, Dave Fisher — whose 1970 super-reflective thesis on extensible control structures I much admired, was a powerful figure at DoD for this effort. And others I admired greatly — such as Bill Wulf — contributed designs (the “Tartan” language proposal for ADA).

There was a DoD side-condition that they could not be talked out of — Fisher couldn’t either from the inside — that ADA had to run efficiently on every existing DoD computing system. It was pointed out that Moore’s Law — happening swiftly whereas language designs and implementations, especially by committee — would make it ridiculous not to replace most of the DoD HW by the time ADA got implemented to a useful state.

I knew and liked Jean Ichbiah, and thought his LIS language design was pretty good (especially for something aimed at compiling into machine code). He eventually won the contract for Honeywell-Bull (and ADA turned out to be a lot more intricate than LIS was).

Bottom line: a lot can be learned about both language design, software engineering, and especially how “whole-systems real computer science” could have helped. It is also worth noting that almost everything that was done in the 1995 revision of ADA was part of the original design discussions.

As Robert Heinlein remarked, a really bad committee is “the only form of life with 10 bellies and no brain”. This wasn’t that bad, nor was it as bad as a “typical committee”: “whose collective IQ is the minimum of the individuals in the group”. I think that ADA turned out a lot better than the earlier comparable project of IBM — PL/1 — another language that took forever, delivered not so well, and is still around (I know of at least one major insurance company in the US that still employs 2000 PL/1 programmers!)

To what extent does each invention require the invention of jargon for that invention?

Great question!

I like a process that seems generally unpopular, but has been extremely useful in science especially: that is to try to coin special words for the special new meanings, and part of getting fluent in the new knowledge is to get fluent with the new terms.

This is almost the exact opposite of normal language churn in commonsense cultures, where redefining old words to mean new things in their day and time is the general case. This makes language ever more metaphorical and idiomatic (which ironically also requires a lot of learning and memorization: but for new meanings rather than new meanings and a new term for them).

This sets up deep conflicts between the two cultures. I don’t think the compromises help either culture.

Interesting examples abound in fields that are supposed to be technical — such as computing — e.g. the term “Artificial Intelligence” was defined by John McCarthy ca 1956 and had quite a solid meaning (and very much based on the non-metaphorical combination of the two terms). Today, “AI” actually means such an extreme subset of the original definition that the original meaning has been quite lost, and references to the original meaning now have to use terms such as “Real AI”, “Strong AI”, etc. This is also the case for many other terms in computing today, e.g. “computer science”, “software engineering”, object-oriented programming, and many more. This is a mark of a “pop-culture” (not quite a “cargo-cult” yet …).

Just to round this out … it’s worth contemplating that science itself was initially also guilty of redefining earlier terms in attempts to make contact with outsiders, and reworked terms such as “know”, “proof”, “truth”, “theory”, etc in ways that still confuse non-scientists today. I think it is much better to just make up new terms for new meanings to help people realize there are important differences, and which have to be learned.

What is the best book for Research Methods in Computing?

It would be fun and illuminating to see the opinions of others — either as answers or comments here.

The best book about the ARPA-IPTO/Parc research community (1962–1982 or so) is “The Dream Machine” by Mitchell Waldrop (it has been brought back into print by Stripe Press).

This is an excellent history of those 20 years, with good historical accounts leading up to 1962.

I wrote an essay tribute — The Power of the Context — to this community in 2004, which might help a bit — and it does have an extensive bibliography.

The methods used by this research community were quite effective, and the results include many of the fundamental technologies still in use today (Internet, Ethernet, personal computing, GUI, DTP, laser printing, etc.)

The Waldrop book is mostly about what, but does give some glimpses of how.

I got asked to do several lectures at Stanford a few years ago about “How To Invent The Future”, and these have quite a bit more about methods used.

https://www.youtube.com/watch?v=id1WShzzMCQ

https://www.youtube.com/watch?v=1e8VZlPBx_0

Did Alan Kay give advice to Guido van Rossum on the design of Python? The language is said to have a object model that leans toward the original late-binding, kaynesian definition, instead of the abstract datatypes route taken by e.g. Java and C++.

I didn’t give Guido any advice on the design of Python. The aims of Python and its early history are easy to find with Google. Guido also wrote a blog with many articles about different aspects of Python history and development.

In software engineering, when might pragmatism be harmful in the short term?

Almost always. As Don Knuth wrote long ago “Premature optimization is the root of all evil”.

If the term “engineering” is to mean anything in “software engineering”, the higher priorities all have to do with the longer term life cycle requirements of the artifact — and include goodness of model, safety, testability, debugability, understandability, changeability, scalability, portability, and many more.

One way to think of a well designed system is that the semantics will be separated from the optimizations as much as possible (one can imagine tools to help do just that) so that the system can be debugged and changed, etc., in terms of the semantics and will run — perhaps too slowly — with the optimizations turned off.

Did Atari use the Forth language for any of their software?

I was there from late 1981 to early 1984, and was not aware that FORTH was used in any Atari systems. However, it is certainly possible that a use escaped my notice.

Is it important to turn what is actually needed into a pop culture?

I think “just the opposite”.

It’s important to turn pop cultures into “what is actually needed”.

Did UNIX fulfill or exceed the vision of the Multics project?

I thought Paul Green provided a good answer to a related question.

https://www.quora.com/Why-did-Unix-succeed-and-not-Multics

The question here is about “the vision” of the two systems. I think that the Multics effort always had larger “vision” about “why?” and “what is actually needed” (large parts of the vision were “The ARPA Vision”). So my direct answer to the question would be “No”.

(Added Aug 7th — I realized that I should have mentioned that MULTICS didn’t fulfill or exceed the vision of the MULTICS project either …)

The approaches to implementation were also very different.

My opinion of Multics was that it (a) chose a poor hardware framework, and (b) perhaps overacademized the software design.

My opinion of Unix was that it (a) also chose a poor hardware framework (but a very different one), and (b) this and other factors rendered the software design too simplistic. (As with many artifacts in computing, “(b)” allowed something to get implemented anyway.)

Both systems did have a few strong ideas and principles.

My main opinion was that there were at least two alternatives to look at that were much better bases than either Multics or Unix

The Burroughs B5500 type architecture, including multiple CPUs, fine grain protection, HLL OS and implementation languages, etc.

The “Project Genie” type architecture used in the ARPA community (that morphed into Tenex, and another sprout that was the CAL-TSS approach).

The 70s allowed practical microcoding and VLSI RAM. This was too late for Multics, but the Unix folks could have taken this route.

By the end of the 60s it was clear (to some of us at least) that the B5500 approach would be “just wonderful” if it were microcoded. Parc indeed did all of its hardware using microcode, including several mainframes, but Parc’s aim was personal computing, not time-sharing, so the mainframes were mostly used to emulate some of the hardware the external community was using (such as PDP-10s). The most interesting TSS use was “D-Lisp” of Warren Teitelman and Bob Sproull, which ran Interlisp on the mainframe and used Altos as windowed graphic terminals. This was quite good.

In my view, one of the first really great systems that was a modern version of the “ARPA vision” was Gerry Popek’s distributed multi-HW “LOCUS” OS in the 80s. This was really good. (It did use a modified Unix, but this was not a critical component of the design: any encapsulated process architecture would have worked, and some would have worked even better.)

What are the must-read books for software engineers?

I’ve been trying to find some (I got invited to do a keynote at a large upcoming Software Engineering conference, and have been looking to see if there is more of a consensus and higher level view already in the literature).

So far … one observation of what I have looked at is that “what is actually needed” to deal with the scalings from both Moore’s Law and hardware engineering doesn’t seem to be treated. I.e. not just individual and super-computers and cloud computers, but the billions of them linked by the Internet and other networks. (This is disappointing because the original Internet design efforts were aimed at 10^10 and more scalings, and were successfully pulled off by a group of top researchers.)

I welcome helpful comments and suggestions!

---

Added Aug 12th: no helpful suggestions from readers yet. Does this mean you don’t think there are any “what is actually needed” books for “software engineering”?

What is your review of The Pragmatic Programmer (1999 book)?

I don’t have a detailed memory of the details of this 1999 book, but do remember some impressions.

The main one was that it — correctly I thought — addressed “programmers” in general as people who had not learned or done engineering in their past or as part of learning to program. Much of it was like the advice a professional musician — especially a classical or studio musician — would give to pop culture teens and twenties who are starting to play instruments, but decline to take lessons, learn to read music, etc. I.e. “there are a bunch of things you really need to buckle down and learn fluently in order to be a productive pro”.

Added Sept 1 2021 (from a comment below) — OK, I found a pdf of the 2nd edition and perused it. It has a lot of good advice (praising doesn’t require a lot of substantiation). The simplest criticism I can write here is that I think it has many too many topics and too many styles to serve as a good starting place for beginners. I think taking more of a systems approach would improve the presentation a lot. I think the “pop culture” allusions (above) were too harsh (some tinges would remain in a more balanced criticism).

Another recalled impression is that — as with so many books on programming — it was very devoted to the general milieu of programming in corporate US, and the kinds of languages, tools, and approaches most in use. This is definitely an aspect of “pragmatic programming”, and a lot of the advice along these lines seemed sound for those who chose to cope.

My view is that this kind of pragmatism is quite harmful in the long term, and often even in the short term. I.e. learning better ways to apply band-aids does increase knowledge, but if the patient is suffering from suppurating wounds, then real medicine and the learning and inventing of real medicine is what is actually needed (WIAN). I don’t recall any such rabble-rousing in necessary directions in this book.

I think the book became quite popular and successful.

Can processes notify operating system that their stack and heap will need this much of memory prior to starting? Or is it operating system's decision to give a predetermined amount of space for heap and stack for every process?

Here is one of the many great ideas in the “Project Genie” OS (lead designer: Butler Lampson, and later used to launch commercial time-sharing, and for the Engelbart “Mother Of All Demos” towards the end of the 1960s).

The SDS computer was a 24-bit 0.5 MIPS machine with 64K words of storage and address space. The Genie project at Berkeley added a HW memory map with 2K pages, but the tiny slow resources were real barriers for practical time-sharing with multiple dozens of users. Both long swapping latencies and thrashing were deep problems.

Butler decided to “force” working sets to be four 2K pages — so he could line up the next few working sets in storage to reduce process switching latencies to almost zero — and to reward programmers/programs that could predict the next working set. The reward was to have your process run on the fast que. If you failed to predict, and had a page fault, he would run you on the slow que for a while.

This worked wonderfully well. The programmers did a lot more thinking about the dynamic state of their processes, and put in more SW instrumentation to help predict.

The system overall was amazingly efficient, and could indeed run dozens of users at a time without latencies and thrashing. It’s worth contemplating that the entire “Mother of all demos” was done with multiple users in real-time using this tiny little hardware of 192K bytes and 0.5MIPS.

Another of the many ideas used in this system was aggressive “before the fact” use of the swapping drum memory. The idea was to have dirty bits on pages to mark if they had been changed, and to use about 2% of the CPU etc, to continuously write out dirty pages. This meant there was almost always enough room in core to bring new pages in when needed, without having to write out something first.

This combined with yet another Butler perspective: something will crash — computer hardware was quite prone all by itself — so don’t pretend that everything will work (and then get badly caught when a crash does happen). The aggressive flushing, prediction, and some other ideas not mentioned here, all allowed the system to resume within seconds of a crash.

Margaret Hamilton, who was head of the software for the Apollo Guidance Computer called this kind of systems approach “before the fact” design (the “facts” being things you didn’t want to have happen). When asked what she was doing, she said “Software Engineering”).

In the 70s at Parc, we used some of these ideas — and much of the overall philosophy — for doing the systems aspects of Smalltalk. For example, the aggressive flushing scheme worked well for the object level swapping to a much slower secondary storage we did on the Alto.

It’s worth noting that even today, most OSs are quite rooted in the early 60s with respect to their perspective of computing, hw and sw — the notion of an OS actually “thinking about what is going on and how to help” has never crept into standard notions of computing, and joins up with the early 60s nature of most programming that is done today.

Why would people not do learning and take actions that would help them? A look at the pandemic — and many other situations in normal human life — reveals that, for many, “comfort even with high ultimate costs” is worth more than the “discomfort of learning and changing”.

What is objective quality?

Try typing it into Google and looking at the first hit (that’s what I did …)

How was personal computing influenced by the hippie movement?

The larger category of interest here is “the counterculture”, of which the hippies could be considered one branch (or possibly a parallel “counterculture”). For example, I think “The Whole Earth Catalog” is a “counterculture” medium rather than a “hippie” one. The “Free University” in Palo Alto and Menlo Park in the late 60s into the 70s was “counterculture” much more than “hippies”.

One view related to your question can be found in John Markoff’s book: What the Dormouse Said: How the Sixties Counterculture Shaped the Personal Computer Industry. This especially focuses on the Engelbart group at SRI, who were the epitome of ideas about personal computing in the 60s. Markoff was a student at Stanford while all this was going on, and he remains as one of the most respected techno-journalists.

My view — from having started to visit Palo Alto in the late 60s and living there from 1969 on — is that “hippies” had essentially nothing to do with the development of personal computing, but that the surrounding context of counterculture “helped”. You could look at my Quora answer about The Whole Earth Catalog for more info.

https://www.quora.com/What-was-the-Whole-Earth-Catalog

In the STEPS report, an unresolved problem of massive intermodule coordination and communication is mentioned. Is it more of a hardware problem, or a software/design problem?

For a long time, this was a problem that never quite got worked on — it lost out to other problems — but one I thought “someone” should do (maybe us).

I tossed it into the list of techniques that would make STEPS interesting, and again, it didn’t quite make the cut during the actual work (though we did look at and build some infrastructure for subproblems along these lines).

The basic insight was that extreme scaling was going to make more and more difficult “knowing the actual labels” of useful things, and that something more like “searching for things analogous to descriptions of structures/objects” would be more and more needed.

A simple example is suppose you need the sine function but don’t know that it is called “sine” (e.g. APL doesn’t call it anything like “sine”). Ted Kaehler’s “method finder” can find it in Smalltalk, because one of the things it can do is to look for I/O relationships to find the operation. So if you give it 30 and .5 it will find “degreeSin”. If you give it 3, 4 and 7 it will find “+”, “or”, etc. If you give it an unsorted sorted array and then a sorted version of the array, it will find the sorting functions in Smalltalk. Etc.

One of the terrific theses — and people — we were inspired by during this time was “AM” of Doug Lenat. This was a “discovery system” that could “extract meanings from possibilities”. It had to have a description scheme, and I realized that something like it could be used for the module problem.

One of the things we spent some time thinking about at Parc and thereafter, was “Semantic Typing”, which is another example.

And in the 80s came Gerlernter’s “LINDA” coordination language, that I thought was really a neat way to think about the larger problems of messaging and concurrency, etc.

I’m a bit surprised that no modern day Kaehler, Lenat, or Gerlernter has stepped forth to do this. It is both a really important problem, and the solution (I believe) would be quite beautiful.

It’s similar to “communicating with Aliens” in the stars, but quite a bit easier.

Could some kind of analogy be made up between object-orientation and set theory?

At Parc, we put some thought into this. One idea was to see about relationships between the set of objects you retrieve in a search and that a Smalltalk class generates a set of objects which obey the description in the class.

One way to look at this is that the objects of a class that haven’t been instantiated yet could be thought to exist — in potential — in a kind of “Platonic Wilderness”, and that saying “new” to the class was a request to “find” one of these, and bring it out of the “wilderness” into more tangible “being”. Or, for example, that “3+4” could be thought of a kind of search request that will “find” a “7”.

And Smalltalk actually had a message “allInstances” that could be sent to any class and this would gather a set of all the extant instances …

In set theory there is a term “characteristic function” that is an expression which determines/defines what is in a set … A Smalltalk class could be considered to be the “characteristic function” of its possible instances.

A search of objects to retrieve a result set — on the other hand — is very “Aristotelian” (he liked to generalize from collections of examples) … The characteristic function of these sets is the search criterion.

What if you kept these live? What if you could “force” an object into one of these sets?

In the spirit of “not multiplying entities unnecessarily” (what Occam actually said), it was untidy to have two things that were almost the same idea.

I wrote a paper about what it might be like to unify the two kinds of sets. Some form of this is still an interesting idea …

Can the concept of Unix pipes be added to apps and GUI's?

The “pipe” idea in Unix was originated by Doug McIlroy in the early 70s, but the notion of interprocess communication was much earlier, and one of the first systems to use interprocess communication (and to invent “data-flow programming”) was by Bert Sutherland (Ivan’s brother) at Lincoln Labs ca 1965.

image

Parallel Data-flow programming with graphical programs ca 1965

I forgot to say: so the answer is “Yes”!

What is the relationship among programming paradigm, object-oriented programming, and event-driven programming?

“OOP”, especially in its original form, is a comprehensive way to define any computational idea by using intercommunicating entities that are a single kind of universal building block with the semantics of a whole computer.

One of the most natural ways it has been used is to model complex systems, including complex software systems.

In OOP, a message between entities is an “event”.

To do “event-driven programming” one would organize one or more “event-driver objects”, that entities which desire to be “event-driven” would initially subscribe to. When an event-driver object senses an event, it will send wake-up messages to all objects that have subscribed to it.

What's the difference between levels and paradigms in programming? Can we talk about declarative and imperative paradigms only when talking about high level programming languages?

Most computers in history, including most today, are set up to have many of their lowest level instructions be “commands” (in other words “imperative”), and especially with regard to rewriting lowest level memories.

A “declarative” form at the lowest level would be something in which the desired relationship of the result — but not how to compute it — is interpreted by the hardware. Note that this would be the case for most elementary “operations” such as “ADD”, “SUB”, “MUL”, “DIV”, etc. There are different ways to do these in the hardware, and most computers do not allow the programmer to say which or to provide alternatives.

So I think we just did talk about “declarative” and “imperative” ‘paradigms’ without mentioning “higher level programming languages” (until just now).

What progress have computer scientists made toward solving the symbol grounding problem?

Excellent question! Simple answers are “some”, “not a lot”, and “not a lot of work is going on these days on this problem”.

(Note that David Vanderschel’s answer indicates that he thinks “imitations” by ML systems (ML as Judea Pearl points out is an extended form of “curve fitting”) are doing well enough. I don’t agree — I think that “organizations of organizations” will be necessary for the lower level simplicities to be effective.)

A preliminary question is “how do humans seem to do ‘symbol grounding’ ?”. Our brains are also isolated from the world we live in, and are connected to the outside only by simple representations of “phenomena”. We can see that, philosophically, the problems of grounding are rather similar — the big questions have to do with the kinds of organizations needed.

A place to start that has considerable thought and many examples is the CYC system of Doug Lenat, which is a very large working model of an attempt to capture much of human commonsense reasoning and beliefs. It’s worth poking into this work to see how it accomplishes “grounding of its symbolic representations”.

Would a computer scientist from 1985 understand an iPhone and how it worked? Would they be able to take it apart and advance our civilization?

Sure

Remove Twitter and Facebook?

What elements of syntax should always be taken into account?

“Syntax” is the position of something relative to the position of something else. It can be any dimension. It’s usually thought of as 1-dimension (as a string of characters or words, etc.) but in a GUI, quite a bit has to do with nearness and enclosures, and occasionally overlaps (another .5 D).

It is a very good idea to learn how to be very flexible with regard to syntax because this helps learning new organizations of things. (This seems to be a rather lost trait for most computerists today.)

How does a desktop environment work? How does a GUI work? Does the OS recognize where the mouse is and on top of what? Does it depend on the pixels color?

Here is the entire memory of the Xerox Parc Alto computer in the early 70s showing every bit. You can see that half of it was used to hold the bits (1 bit per pixel) that would show up on the 606x808 screen as black or white pixels).

image

And here’s the Alto itself with that image on the screen.

image

The CRT (TV like display) is constantly scanning from left to right and then moving a line downward. Some software and a small piece of hardware are picking the bits of the image from memory so that the beam can be turned off or on depending on whether a pixel should be black or white.

While this is going on the physical mouse is also being tracked. When the mouse is moved with one’s hand, a program adds the incremental change in the up-down direction to get a new screen up-down location (called the “y” location measured from the top of the screen) and a new left-right location (called the “x” location (measured from the left). This is used by another little piece hardware and software to display the cursor image on the screen.

Meanwhile, there are other programs also running in the rest of the memory, and some of these have structures that represent the rectangular images we see on the screen (a rectangle can be represented by the x-y location of the upper left corner (called the “origin: of the rectangle), and the x-y location of the lower right corner (called the “corner”. It also knows which rectangles are on top of or below the others.

When you press on the mouse button, the program compares the mouse location to the list of rectangles, starting with the topmost one until it finds the first one that has the mouse located inside it.

The comparison is simple:

if mouse x is greater than origin x and less than corner x

and

if mouse y is greater than origin y and less than corner y

then the mouse is enclosed by the rectangle.

If so, then it wakes up the program that deals with that rectangle and asks it to find where the mouse is pointing to things inside the rectangle (these are also always things with rectangular boundaries). At some point, a rectangle is found that knows what should be done when it is pointed at, and another program is awakened to cause the desired action.

What's the best way of combining functional and OO programming in practice?

I’m assuming that the terms in the question are contemporary “code words” for very particular practices that now have the labels “functional programming” and “object-oriented programming”.

I’m going to use much older and more original meanings. For example, I’ve got a pure math degree in my misspent youth, and so I think of functions in terms of correspondences between “inputs and outputs”, and “official functions” as a particular kind of never changing correspondence between inputs and outputs, and that they have more relaxed relatives with looser correspondences between inputs and outputs.

Similarly, I had something to do with programming and systems semantics in the 60s, and one of the “recognitions” I had back then was — since any computer with enough storage can compute what any other computer can compute — that all computing semantics can be represented just with “semantic computers intercommunicating with messages” as the sole building block.

I should have come up with a better term than “object-oriented” for this, but that ship has long sailed.

So any system with “semantic messaging computers” as objects can easily make any kind of “functional programming” system.

But that’s not very interesting, except to note that a lot of expressive power would be thrown away by doing so.

Another way to look at this is that “semantic messaging computers” allow any kinds of definitions for any kinds of things-in-process-or-not, so we can use it to easily make old style programming things like data-structures and procedures. This would be a very weak thing to do, but it is the predominant style of what is called OOP today.

Or, we could note that “semantic messaging computers” can define and simulate the parts of any complex system no matter the size and complexity, and to render the parts themselves as systems, and so on recursively. That’s a much more powerful way to look at things (and is the way I looked at things back in the 60s).

Given this — and especially given the possible scalings involved — we need to think carefully about ways to make complex systems that allow us to do what we actually need, and also be able to think about — reason about — what we are doing.

We especially need to have some good ideas about how to deal with “space and time”. All of our systems need to “move through time” (which means their state will be different in some ways at different times). This is true even for a pure “function” of the first kind (which always produces the same output for a particular set up inputs) — Why? Because we have to make the function in the first place, and we have to debug it and often improve it. A pure functional language cannot do that — some forms of kludge — or perhaps a really good idea? — have to be added.

I like to point to John McCarthy — one of the greats in our field, and — as with so many of the real greats — one who was very early with his ideas and impact. John was an excellent mathematician who also had a deep feel for what is special about computing.

In the latter part of the 50s John got more and more interested in what he started to call “Artificial Intelligence”. He was also doing some consulting and this brought him in contact with the SAGE air defense system: large systems of very large computers attached to radar stations and each other and usable by graphical display systems with pointing devices.

John’s reaction was “Every home in America will have one of these”. He could see that the networked computers could be thought of as an “Information Utility” (as a parallel to the existing utilities for electricity, water, gas, etc…) and that the terminals in the homes could provide many kinds of “information services”. Among other things, this got him to advocate that MIT etc do “time-sharing” of their large mainframes …

He also realized that the computer milieu of the 50s — machine code and the new Fortran — did not intersect well with “most people in US homes”. This got him to write a paper in 1958 — “Programs With Common Sense” — and to suggest that what was needed for the user interface was an active semi-intelligent agent — the “Advice Taker” — that could interact with users in their commonsense terms, could learn from “taking advice”, could problem solve on behalf of the user and itself, and so forth (MIT AI Memo 17).

This got him thinking about how to implement such an Advice Taker, whose main mechanisms would be various kinds of logical deductions including those that required actions. There wasn’t much to go on back then but a few gestures at “list processing”, so he decided to invent a language that could be used to make the Advice Taker (and other kinds of robots), and more generally allow symbolic computation to take its place alongside the existing numerical computation.

John was an excellent mathematician and logician, and so he also wanted to come up with “A Mathematical Theory of Computation” to put ideas old and new on a firmer basis.

His result was LISP (for “LISt Processing”). I have written elsewhere about its significance.

Meanwhile, he was pondering just what kind of logic, math, and programming (he thought of these as highly intertwined) could be used to deal with a robot in the real world.

<eliminating detail here> A conflict was between at (robot, philadelphia) and at (robot, new york) which could not happen simultaneously, but could happen “over time”. This was like the problem of contemporary programming where variables would be overridden (and sometimes even files) — basically, letting the CPU of the computer determine “time”.

This destructive processing both allows race conditions and also makes reasoning difficult. John started thinking about modal logics, but then realized that simply keeping histories of changes and indexing them with a “pseudo-time” when a “fact” was asserted to hold, could allow functional and logical reasoning and processing. He termed “situations” all the “facts” that held at a particular time — a kind of a “layer” that cuts through the world lines of the histories. cf McCarthy “Situations, Actions, and Causal Laws” Stanford, 1963 prompted by Marvin Minsky for “Symbolic Information Processing”.

One of the ways of looking at this scheme is that “logical time” was simply to be included in the simulations, and that “CPU time” would not figure into any computation.

<more detail excluded here> This idea did not die, but it didn’t make it into the standard computing fads of that day, or even today. The dominant fad was to let the CPU run wild and try to protect with semaphores, etc. (These have the problem of system lockup, etc., but this weak style still is dominant.)

Systems that have used part or all of John’s insight include Strachey’s CPL, Lucid, Simula, etc. Look at Dave Jefferson’s TimeWarp schemes, Reed’s NetOS, Lamport’s Paxos, the Croquet system, etc.

Back to McCarthy and — now — objects. One of the things we realized at Parc was that it would be a very good idea to implement as much of John’s “situations” and “fluents” as possible, even if the histories were not kept very long.

For example, this would allow “real objects” to be world-lines of their stable states and they could get to their next stable state in a completely functional manner. They would be “viewing themselves” with no race conditions to get their next version.

This would also be good for the multiple viewing we were starting to use. You really only want views to be allowed on stable objects (/relationships) and this can be done by restricting viewing to already computed “situational layers”.

Parc was also experimenting with “UNDO” and the larger community was starting to look at “parallel possible worlds reasoning”.

The acts of programming itself also wanted to be in terms of “histories and versions” and systems should be able to be rolled back to previous versions (including “values”, not just code). cf Interlisp, and especially the PIE system (done in Smalltalk by Goldstein and Bobrow).

This was another motivation for “deep John” in future systems. I.e. do everything in terms of world-lines and “simulated time”. A recent paper by Alex Warth shows some ways that “Worlds” can be quite fine-grained. http://www.vpri.org/pdf/tr2011001_final_worlds.pdf

The last point here is that “Histories R US”. I.e. we need both progression in time for most of our ideas and rememberings and we also want to reason clearly about how every detail was arrived at (and to advance the system).

John McCarthy showed us how to do this 60 years ago this year and wrote it down for everyone to read and understand.

So: both OOP and functional computation can be completely compatible (and should be!). There is no reason to munge state in objects, and there is no reason to invent “monads” in FP. We just have to realize that “computers are simulators” and figure out what to simulate.

How did Smalltalk-80 manage to work with only 32,000 objects maximum (roughly)?

The simplest answers are that “it had to” and “we really wanted it to” and “we had Dan Ingalls”.

The Alto computer (April 1973) for many years only had 128KB of RAM, organized as 64K 16 bit words, and roughly half of this was used for the 800x600 bit-map screen buffer. The 1K of microinstructions in a separate very fast 32bit wide memory and the 16 zero-overhead task switching program counters allowed almost all of the functions done in the HW of more vanilla architectures to be rendered as SW. And this allowed us to come up with more ideal frameworks for VHLLs and still to have them run efficiently enough to make progress.

The small amounts of working memory available made us want objects to have small footprints: the average Smalltalk object was less than 16 bytes in size. And we wanted to have as many as the address space would allow, so we made the address space of Smalltalk be 16 bits in object terms. We used the Lisp trick of having the even numbered objects be an encoding for small integers (the “tag bit” was the lowest order bit), so Smalltalk really had 64K objects, of which half were small integers (but I’ll stick to the 32K for “others”). This meant that a typical Smalltalk full image would still fit in about 1MB of space (and this allowed for the object swapping scheme that Smalltalk-76 used to be the very slow and small “3rd level memory” Alto disk packs).

image

Xerox Parc Alto — including display, keyboard, mouse, disk packs — and disk drive and bulk of the machine under the table

image

A schematic Of Smalltalk-76/78 showing the logical storage organization of a running system, and a one-page “simulator” of the underlying byte-code interpreter (most of which was done in Alto microcode (Ingalls 78)).

Much of both the architecture and compactness of the implementation was due to Dan Ingalls, who was recently inducted into the Computer History Museum Hall of Fellows (my vote for the top all around award in computing).

A good example of what can be done with less than 32K objects can be seen in the tribute video I did for Ted Nelson some years ago now. For a Christmas vacation project ca 2014, a few of us decided to see if we could revive a mysterious file from a Parc Alto disk pack that Xerox had thrown away, but which had luckily been rescued. It seemed to be a Smalltalk image file from the version done for the portable Notetaker computer in 1978.

image

Xerox Parc Notetaker portable personal computer ca 1978 with both mouse and touch sensitive screen

This was the first Smalltalk in which almost everything was completely written in Smalltalk, and only a tiny kernel had to be written in some lower level code. The story of this revival can be found here: Reviving Smalltalk-78—The First Modern Smalltalk Lives Again . The heavy lifting for the revival was done mainly by Dan Ingalls and Vanessa Freudenberg.

This was so much fun that I decided to use it to do all the presentation material for the tribute talk I was asked to do for Ted Nelson’s 75 birthday (video starts at 2:15).

https://www.youtube.com/watch?v=AnrlSqtpOkw&t=135s

You can see that this system from 1978 could do quite a lot with <32K objects and about 10K lines of Smalltalk code for everything down to the metal, including what would be called an “OS” today.

The main difference here is that the revival had tons of memory space as working memory, and this allowed many more bit-map images used in the presentation to be fit into physical memory compared to then. Also, I am using the Dorado screen aspect ratio — 1024x800 — instead of the Alto or Notetaker dimensions. As noted in the video, Javascript is many times faster than even the Dorado microcode at Parc, so it was relatively easy to write the kernel and automatically get a good cycle time for an end-user.

image

The Xerox Parc Dorado console and screen ca 1979 running Smalltalk-78

I wound up using almost all of the 32K possible objects for this video, so this is also a way to see what can be done with this limit.

The “How” part of the answer has to do with “actual complexity” vs “accidental complexity” and how the Smalltalk language and the Smalltalk system were designed.

Smalltalk-76 was the first of the Smalltalks to render most of itself in itself, and much of the neatness of the design was due to Dan Ingalls’ ability to “choose wisely”. One of my favorite papers in computing is his “Principles of Programming Languages-78” paper about Smalltalk-76, and this will give some of the flavor. The bottom line is that a really good and pretty complete system can be done with about 50 classes. Smalltalk-78 a few years later attempted to do a “really complete” system, and this required less than 100 classes.

It was clear back then that inheritance in the way that Simula used it was on the one hand not strong enough to do a great job expressively, and on the other could be used willy nilly to violate good design principles (and the idea of “inheritance” itself). So “we” tried to be very careful. In the case of both -76 and -78, “we” was really Dan, who would pretty much rewrite the entire system for each of our major releases. This produced beautiful and compact expressions that had very little “accidental complexity”, and gave interesting examples of the “actual complexity” of relatively large interactive graphical systems.

Bottom line: you don’t need a lot of code in a well designed language and system to do a lot. So: besides getting a lot in about 10K lines of code, this also translated into getting a lot from less than 32K objects.

Is Object-Oriented Analysis to object-oriented design an easy transition?

I’m pretty sure I don’t understand this question.

I think *design *in general is extremely difficult to do well. Most things that can be designed have a systems nature, and a good perspective on this will quite often lead to organizations that can be rendered on computers using a good enough object oriented environment.

Papers about the Smalltalk history refer to the importance of the Xerox Alto's design as a "meta-computer", in which many unanticipated needs could be emulated in microcode. Why modern computer architectures are not microcoded?

The idea of microcode was invented by Maurice Wilkes, a great pioneer who arguably made the earliest programmable computer — the EDSAC (pace Manchester Baby). The idea depends partly on the existence of a “large enough” memory that is much faster (3–10 times) than the 1st level RAM of the computer.

A milestone happened when the fast memory for microcoding was made reloadable. Now programmable functions that worked as quickly as wired functions could be supplied to make a “parametric” meta-machine. This technique was used in all of the Parc computers, both mainframes and personal computers.

Typical ratios of speed of microcode memory to RAM were about 5x or more, and e.g the first Altos had 4kbytes (1k microinstructions) that could be loaded on the fly. The Alto also had 16 program counters into the microcode and a shared set of registers for doing work. While running, conditions on the Alto — like a disk sector passing, or horizontal retrace pulse on the CRT — were tied to the program counters and these were concurrently scanned to determine the program counter that would be used for the next microinstruction. (We didn’t like or use “interrupts” … )

This provided “zero-overhead tasking” at the lowest level of the machine, and allowed the Alto to emulate almost everything that used to be the province of wired hardware.

This made the machine affordable enough that we were able to build almost 2000 of them, and fast enough to do the functionality of 10–15 years in the future.

Key uses of the microcode were in making suitable “language machines” for the VHLLs we invented and used at Parc (including Smalltalk, Mesa, etc.), doing real time high quality graphical and auditory “animations/synthesis”, and to provide important systems functions (e.g. certain kinds of memory management) as they were invented.

It’s worth looking at what could have been done with the early 16 bit VLSI CPUs such as the Intel 8086 or the Motorola 68K. These were CISC architectures and were fast enough internally to allow a kind of microcoding to support higher level language processing. This is particularly important to separate what is a kind of interpreter from having its code fetched from the same RAM it is trying to emulate in.

The 68K in fact, used a kind of “nano-coding”, which could have been directed to reloadability and language processing.

The big problem back then was that neither Intel nor Motorola knew anything about software, and they didn’t want to learn (and they didn’t).

The nature of microcode is that architectures which can do it resemble (and anticipated) the RISC architectures. And some of the early supercomputers — like the CDC 6600 — were essentially RISC architectures as well. So there was quite a bit of experience with this way of thinking.

In the 80s, the ratio between RAM and CPU cycles was closing, and Moore’s Law was starting to allow more transistors per chip. Accessing a faster memory off CPU chip started to pay off less (because going off chip costs in various ways, including speed).

Meanwhile, it was well known that caching could help most kinds of architectures (a landmark study by Gordon Bell helped this understanding greatly), and that — if you are going to cache — you should have separate caches for instructions and for data.

Up to a point, an instruction cache can act like a microcode memory for emulating VHLLs. The keys are for it (a) to be large enough to hold the inner loops of the interpreter, (b) to not be flushed spuriously, and (c) for the machine instructions to execute quickly compared to the cache memory cycle.

Just to point the finger at Intel again, they did a terrible job with their cached architectures, in part because they didn’t understand what could be gained with VHLLs.

A really interesting design was the first ARM — which was a pretty clean RISC and tidy in size. It could have been used as an emulator by wrapping it with fast instruction memory, but wasn’t. I think this was a “point of view” disconnect. It was a very good design for the purpose of its designers, and there wasn’t enough of a VHLL culture to see how it could be used at levels much higher than C.

If we cut to today, and look at the systems that could be much better done, we find that the general architectures are still much too much single level ones, that ultimately think that it is good to have the lowest levels in a kind of old style machine code programmed in a language like C.

A very different way to look at it might be to say: well, we really want zillions of concurrent and safe processes with very fast intermessaging programmed at the highest levels — what kind of architecture would facilitate that? We certainly don’t want either “interrupts” or long latency process switching (that seems crazy to “old Parc people”. We probably want to have “data” and “processing” be really close to each other rather than separated in the early von Neumann ways.

And so forth. We won’t be able to be perfect in our hardware designs or to anticipate every future need, so we must have ways to restructure the lowest levels when required. One way to do this these days is with FPGAs. And given what it costs to go off chips, microcoding is far from dead as another way to help make the systems that we desire.

The simple sum up here is that “hardware is just software crystallized early”, and a good systems designer should be able to design at all levels needed, and have the chops to make any of the levels if they can’t be purchased …

Who created the first computer with GUI and in which language?

I’ve written elsewhere on Quora about this and similar questions.

Both of the “earliest” British computers — Cambridge EDSAC, Manchester Baby and Mark I — had electrostatic CRTs, which could show a limited 2D bit-map of part of storage, and both soon were used for games. But I have never been able to find any mention of pointing devices.

There were 2D tactile sensors going back into the 20s in the form of joysticks with potentiometers on them that could be used for remote control of airplanes, etc. (I put a picture of one of these in another answer.)

It is very likely that Whirlwind I at MIT was the earliest use of a pointing device on a display (the light pen/gun invented by Bob Everett in the very early 50s).

image

I don’t think Whirlwind II (the SAGE air defense system) invoked commands with the light gun, but used it to point to different displays on the screen of different aircraft info.

image

SAGE Interactive Terminal With Light Gun (ca. 1956–7)

My vote for “a very early” real system with a real GUI, including a pointing device, is Ivan’s Sutherland’s Sketchpad (ca 1962). It was written in a sophisticated macro assembler — CORAL — done by Larry Roberts (who later was the driving force behind ARPAnet) and ran on the TX-2 (the SAGE experimental computer at Lincoln Labs).

https://youtu.be/495nCzxM9PI

By the way, the Wikipedia article on GUIs and their history is both inaccurate, and also leaves out important early systems. I advise not taking it too seriously (this is often the case with Wikipedia articles about computing, especially “earlier” computing).

For example, the article wrongly states that the Parc GUI didn’t have icons, dragging, etc. (why are people who are ignorant so pugnacious about bluffing?)

Here is one of the systems at Parc done in Smalltalk ca 1974) showing many important aspects of what could be done, including multiple windows, icons, dragging, drawing, real-time animation, etc.

https://youtu.be/p2LZLYcu_JY?t=2432

When hobbyists built their own computers back in the 70s what applications did they write?

That was a tough time for personal computing. Amongst the disenfranchised, there was a hunger for participation, and most did not have access or the funds to rent time-sharing terminals.

At Parc we were lucky to have the funding and a few geniuses to be able to design and build our own personal computers, and to make them fast and large enough to do what personal computers 12–15 years in the future were going to do. So we did “real applications” of the future, such as the WYSWYG word processor that became Microsoft Word in the 80s.

Even so, we would have had a very difficult — not impossible — time of it if the Intel 1103 dynamic RAM VLSI chip hadn’t appeared in 1971. Decent 2nd or 3rd level storage was in the future, and powerful enough VLSI CPUs were not to happen until the very end of the 70s.

So most hobbiests tried to make do with either making or buying an 8-bit micro. The displays were often just TV screens with a low res D/A modulation of an NTSC carrier.

These could be programmed in either machine code or sometimes in a rudimentary version of BASIC (this could have been quite a bit better, and especially if JOSS had been taken as a model).

This sounds limited — it was — but consider that Val Shorre was able in 1963 to do Meta II in just 8K 6 bit bytes on an IBM 1401 (a much less nice architecture). Peter Deutsch was able to do a remarkable Lisp in 1965 on a 4K words PDP-1 (roughly 8K+ bytes).

In practice, most of the hobbiests didn’t learn what Shorre and Deutsch knew about computing, and were content to try to make things they felt were in their space such as games. and often simple filing systems. The original BASIC at Dartmouth in the 60s was a little richer, but was part of a time-sharing system that had considerable file storage. Dartmouth had already made the university library open to anyone in the town (Hanover) and decided to do the same thing with the TSS. So one could find in the 60s in Hanover that the gas stations and convenience stories had teletype terminals with real apps made by the end-users that were used every day. This was really interesting! And more like the future than the 8-bit micros were.

In the 80s, when Apple released HyperCard, there was a huge surge of home-built useful applications, some really interesting, and some quite terrific.

What are the pitfalls in implementing QFD?

In Marketing “the customer is always right”. In Education “the customer doesn’t know enough and needs to be taught stronger ideas”.

Most of the important computer technologies in use today — Internet, personal computing, GUIs, etc. — were not thought about by customers. Some of the best of these — and most useful for customers — were never adopted because the customers didn’t want to learn new things, and the vendors couldn’t (or didn’t want to) find ways to fool them into real progress.

So the real pitfalls in schemes like QFD is that real progress often doesn’t get made, and — worse — the results are often retrograde.

A much better idea is to not think of “customers” — but human beings — and not what they want but what they actually need. Then figure out how to get them to learn it.

What was the problem that HyperCard solved?

The question should be rephrased as “For what vital parts of Personal Computing was HyperCard a great advance?”

“Personal Computing” and “Personal Computers” were mostly conceived and defined by the ARPA-IPTO/Parc research community of the 60s and 70s, but whose roots reached back into the 50s, especially to Whirlwind at MIT.

“Really early” examples of these were “George” (an interactive higher level programming language on Whirlwind ca. 1954 by Laning and Zwerler), and the Bendix G-15 (by Huskey and Evans, et al. ca. 1956).

These helped lead the ARPA community to conceive of “personal computing” as including the ideal and goal of interactive “symmetric use and authoring” of “everything”. The idea was that there should be ways to present to end-users to allow them to do all of the kinds of things that (a) could be done with computers and/or (b) were done by computer professionals, (c) and especially new kinds of things that computer professionals hadn’t thought about or had devalued.

This included “end-user programming” — and it was clear that some considerable work would need to be done to allow non-pros to do this.

“Definitive and early” examples were “Sketchpad” by Ivan Sutherland, “The LINC” personal computer by Wes Clark, both at Lincoln Labs; and especially: JOSS by Cliff Shaw at RAND, all around 1962. The latter was especially aimed at people who could really make use of a well designed higher level “end-user” programming system: it was the first really great “end-user programming system”.

Also in 1962, Engelbart submitted his first proposal and plan for a large scale wide spectrum personal computing system. This became one of the central resources for “what is it that we are trying to do?”.

Xerox Parc in the 70s was an outgrowth of ARPA-IPTO, and we wanted to do “all of the above” plus to add “media” and “meta-media” as a first class goal. This was partly sparked by the Dynabook idea (with a flat-screen bit-map display).

This led at Parc to the bit-map display, shaded images and being able to paint them, high quality fonts, the GUI, desktop publishing, 2.5D animation, and much more, including: end-user programming in Smalltalk by both adults and children, that included being able to program the media as well as use it and construct it.

This sparked off both the Macintosh and a bit later Microsoft Windows, but on machines that were quite a bit weaker in computer power and capacity than the earlier Parc-built systems. The result was quite a subset.

Dynabook, Magic Slate, Wild Card, and HyperCard

“Apple Fellow” Bill Atkinson really liked the Dynabook idea, and wanted to do a version that he called “Magic Slate”. Bill was a kind of genius along a number of dimensions, and one of these was a great sense for “fun”, “simple great ‘magic’ “, and the mind and emotions of “end-users”.

One idea he had for Magic Slate was that it acted like a stack of as many “pages” as you wanted, with the screen showing the top one, and you could search instantly to find pages to bring to the top to work on. The pages could hold any media.

When the first Mac happened in 1984 he did a little desktop app called “Rolodex” which just held text — you could type anything onto a simulated Rolodex card — and it would instantly search and find and show faster than you could hit the “next” button. This was a truly great facility along all “end-user” dimensions. One of the things he realized is that “retrieval” almost always over-retrieves and thus the main feature that is needed is the fastest “rejection” that can be done. Because of this, “Rolodex” was used for everything.

Meanwhile, he was working on a version of the Magic Slate media system, and eventually decided to do it for the Mac. This was originally called “Wild Card” — later, lawyers pointed out that it conflicted with already registered trademarks. So it became “HyperCard”.

The original internal version/demo of this had many of the features of the release version, but not end-user scripting. To cut to the chase of a complex story, Apple marketing didn’t want to put out HyperCard, I was also an Apple Fellow at the time and Bill asked me to help, we went to John Scully and got him to understand the significance, and John took on the champion role to get HyperCard productized and out the door. I took the opportunity to get John to insist that it be made “end-user programmable”.

The “Scully-blessed” productizing was an immense project and left nothing to chance.

Fortunately — I think — I was involved in other things so I was not around to ruin this project by insisting on a number of things that really had to be done right. The result was a masterpiece in many ways (and with quite a few things that were not done right … but the masterpiece was the point, not what wasn’t).

HyperCard was a complete breakthrough as a wide-spectrum end-user personal computing system. The “brilliant naivete” as a whole thing was better for end-users than what we had done at Parc (it used only a small part of it, but made it much more learnable and useful for the average user than what we did).

Outside of the company, HyperCard was quickly and widely adopted. A few years later Apple was able to count more than 4 million scripted applications done by different users in its “HyperTalk” scripting language. I think this is still the largest successful wide-scale percentage adoption of an end-user media system with real programming in history.

HyperTalk End-User Programming

I have a few theories about why both HyperCard, and especially HyperTalk were so highly adopted by end-users.

the “Rolodex” design was adopted and expanded — especially the automatic indexing and extremely rapid search and retrieval: in almost all cases it was still faster than an end-user’s hand hitting “Next”.

it was very easy to make “media” on each card — generally easier than PPT, etc.

the scripting was carefully modeled from the semantics and pragmatics of end-user interactions with the media. Many HyperTalk scripts were just a pseudo-English rendering of the user actions to accomplish the same goal. (There’s a real sense where this is teaching the end-user some weak non-scalable ideas about programming — but Bill and Dan Winkler (rightly) only cared about *getting the end-users to really write real programs,* and this worked so well in HyperTalk.)

HyperTalk was not a great form for a programming language, and it wasn’t really English either. But a HyperTalk script looked like something you could understand and use, and modify when needed. This “gistibility” was the deep key in end-user-land.

HyperCard had dynamic context layers to which the scripts were attached. The top layer was scripts for buttons and fields, then for the card, the background (like a master in DTP), the stack of cards, the Home stack. And finally were scripts associated with HyperCard itself.

Interestingly, you could also add stacks to this hierarchy to provide facilities like that of a code library. There’s also a bit of “object-oriented” feeling in this and in many other HyperTalk features.

A script was either invoked by an event “On” or by a “message”. The scripts in each layer were shown as a vertical panel in one view. This made “simple things simple, and complex things possible”.

The ad hocness that made the first brushes with HyperCard feel so nice and enticing and ultimately successful, did not scale well, and produced at least one manual from Apple that was almost 700 pages long! However, I don’t think more than a tiny few users even looked in the front part of this book. HyperCard’s success was due to how the first level of stuff looked and felt: i.e. it was a pure act of “end-user design”.

And … there was nothing about the successful parts of HyperCard that couldn’t be brought to full strength without violating its framework (unfortunately Apple marketing still didn’t like/love it enough to get involved — HyperCard was shipped with every Mac as a “value-added” and that was that).

The History, Success on the outside of Apple, Failure on the inside

HyperCard was first shipped in 1987. The large scale adoption had happened by the early 90s. Many “experienced hands” including yours truly urged Apple to make HyperCard the basis of an Internet wide media system (and this was redoubled when the WWW happened in 1992 and when the Mosaic browser was done the next year).

That Apple did not do this will go down as one of the great tragedies of personal computing. The company was deeply floundering and towards no plans that would revitalize it (like really getting on the Internet to fulfill some of the other ARPA/Parc ideas).

Important

It’s worth noting that the current Web browsers to this day (almost 30 years later), do not come close to realizing the ARPA/Parc principles of “symmetric use and authoring in a WYSIWYG environment”. The kinds of authoring that are allowed are completely ad hoc, limited, almost never symmetric with usage, and rarely involve good end-user interface design.

To paraphrase Tony Hoare’s remark about Algol, “HyperCard was not just great for its time, but was a considerable improvement on its successors”.

New Ideas From Great Thinkers

In the ARPA-IPTO heyday of the 60s, a lot of really great people were funded to follow their instincts about Licklider’s vision. This produced many ideas, some of which were coherent — this community communicated quite well (the grad students and interns were the messengers) — and some were quite distinct (the creators were “distinctive” and ARPA didn’t care about coherency and didn’t have master topdown plans).

So there were lots of “Huh, I didn’t think of that!”s.

Parc was much smaller, but had people drawn from the ARPA culture and a likewise desire for first class insights over coherence.

In the 80s — after all of this — things were rather different in so many ways.

But, still, there have been some of these special people with first class ideas that are independent of coherence. To mention just two: the way Bill Atkinson et al. thought about HyperCard and its scripting and packaging, and more recently the way Bret Victor has thought about a number of things, and most especially his take on both what Engelbart and Hypercard — and the ARPA vision — were all about when he created “DynamicLand”.

Really great people are “vitally interesting” regardless of what you agree with or not, what you like and what you don’t, etc. They make an entire world for the rest of us to witness.

That is the nature of Art and Artists.

Are DSLs a good way to represent complex object hierarchies?

This question seems to assume that “complex object hierarchies” are worth better representation schemes.

I think the more important question of good ways to describe “things that you would like to have instances of” is the one that needs to be treated first.

I have always thought of “objects” as they would — and need to — function in a running system, and have noted many times that there is not yet a great scheme for the descriptive semantically related methods to produce definitions from which the instances can be made.

Certainly, the single hierarchy of classes and subclasses is both too restrictive semantically and open to deep kludges pragmatically (for example, most languages with single inheritance allow a subclass to be nothing at all like the superclasses …).

Some of the classic AI expert system languages did better, and perhaps the most interesting and significant offspring of Smalltalk — PIE by Goldstein and Bobrow at Parc — had objects represented as “multiple perspectives on the different roles they might play, both operationally and definitionally”. This was more than a composition or trait scheme, and I liked what they did a lot.

There definitely needs to be a language to deal with relational definitions (and that includes constraints to meaning). The complexities that have to be noticed can be seen in the CYC system, which has one of the most comprehensive ontologies.

A key idea — which had its first experiments in the Flavors system at MIT — was the idea that — if you were going to do something like inheritance — you had to have something “algebraic” about it. I.e. a description of a relationship should be perspicuous enough to show the reader what the new category of instances would be like — this is especially the case for more realistic descriptions that possibly involve multiple kinds of relations that are combined.

If, instead, there is enough freedom to “hack definitions” — analogous to being able to mess with pointers at runtime with code — then “clever people” will eventually (and soon) make messes that are both non-understandable and dangerous.

How is the food at CU Boulder?

I graduated from CU Boulder in 1966, so hope there is no food from back then still around!

I was a “starving student” back then so my options were limited.

What does Alan Kay think about the Commodore Amiga (1985) as a multimedia computer or an appealing tool for artists? How does it compare with Alto or Star, or with Sierra machine Atari was working on as you were Chief Scientist at SRL?

I haven’t found a good history of all this — but there might be one somewhere. It is a very complex set of interactions and much of the context needed to get a good picture is beyond the scope of a Quora answer. Part of the complexity is due to the many idiosyncratic and aggressive personalities involved, and the zeitgeist of the entertainment industries in general.

Just one of many salient facts: in 1982 Atari’s yearly gross alone was more than the gross of the entire motion picture industry. Their profits were large, etc.

After leaving Parc, I became Chief Scientist of Atari in 1982, and was soon joined by Ted Hoff (the inventor and builder of the first microprocessor) as CTO. We were good friends, and the two of us together covered quite a bit of ground for futures for Atari.

Being part of Warner Communications (the parent company) was not unlike being part of the Roman Empire. In common with other entertainment companies — and perhaps even more so — the culture was distinctly “New York” and rather “Mediterranean”. I had lived in NYC and gone to high school there so I was somewhat acclimated to these ways of thinking and doing things.

Cutting to the chase, there were several computers under development at Atari designed to fit into the “post-Alto” world of the 80s (highly graphical, with GUIs like those at Parc etc.). Warner’s was not very interested in these because they thought they could milk the earlier generation of games machines forever.

I pointed out to them that there would likely be an inflection point when “first-timers” ceased to be the growth area of games. This happened in the second half of 1983, and suddenly they had arterial bleeding.

Meanwhile, Jack Tramiel, the head of Commodore — a guy who was extreme even for the gaming industry (his favorite saying was “business is war” and he meant it literally) — got dismissed/resigned from Commodore at the start of 1984, and took some of the people with him to his new company..

In early 1984, the people doing the Amiga (one of them was a former Atari star), ran out of money and came to Atari for a loan. Ted and I thought what they were doing was really good, and helped convince the powers to grant the loan (I forget when but it was before I left in May — I think the loan happened after I had left — maybe in June).

In June or July — I was safely gone — Tramiel did a no cash buyout of Atari. The way I heard the story, he thought he was going to get the Amiga as part of this. But Amiga wound up doing a deal with Commodore, to the immense rage of Tramiel. I don’t think I understand the complete details involved, but there were forms of revenge on both sides of these actions (I was now at Apple and paying attention to other things).

I should finish this with an opinion … Pound for pound, I thought the Amiga was quite a bit better than the first Macintosh. This is because the Amiga designers were thinking bigger thoughts on the one hand — they wanted to do more kinds of things — and because they were able to assess the Motorola 68K much more accurately — and because of their heritage from computer gaming. All of these added up to a manifest need for special hardware for assisting graphics, which was a must given the cycle time and relationship of the 68K to RAM.

It was “interesting” that Commodore did not understand what they had with the Amiga: their world view was much smaller than the designers of the machine. This really hurt the adoption of the Amiga in all the areas it might have flourished. This is especially “interesting” given that the several OSs done for the Amiga were generally quite good at allowing the potential of the HW to be exploited.

As yet another part of this complex history, it should be noted that IBM had successfully frozen almost all personal computer sales except its own when it promoted-and-dumped the PC jr. ca starting mid-84 through early 1985 with an enormous advertising campaign artificially low prices, and then: nothing. This almost killed Apple and the Mac, and it really hurt Amiga as well (and continued in the US). One result was that the Amiga was picked up and appreciated more in the UK and Europe than in the US.

As I said, even an attempt at a summary is too complex …

What does Alan Kay think about considering the creation of software as a design activity, putting it as a third item on the same level as science and art?

This is another case, where instead of a simple question, what is needed are “10 Things” and “20 examples”.

What does Alan Kay think about Problem Frames (by Michael Jackson)?

I try to avoid either praising or criticizing areas that I have not spent some time in dealing with the details. This is one of them. I remember reading at least one of Jackson’s books on his approach to Requirements (but I’ve been living quite a bit in London the last few years and don’t have this material at hand).

I do remember a few things from contact back then.

One was not being particularly interested in design schemes for complex systems that couldn’t be implemented, run and debugged. There are many reasons for this stance, some of which could be called “prejudicial”. Basically, I think it is important to have your design philosophy as a vital layer of your development and running system.

Another could also be called prejudicial: that trying to control more or less out of control mechanisms with a design scheme is analogous to and probably not much more practical than the ploys used to try to control large machine language programming projects in the early 60s My thought then (and now) is that you are much better off just inventing a good VHLL (and to have learned the chops to implement one). In fact, a number of VHLLs were the result of this way of looking at things (for example, Smalltalk), where concerns, both in the large and in the small (e.g. automatic storage allocation and collection, module protections, etc.) are part of the “given world”.

Two of the good ideas I remember from back then were the insistence that the human part of an application be looked at first, and that decomposition should be parallel, not hierarchical. But that is what was happening in the ARPA/Parc communities 30 years earlier in the 60s and 70s — and especially for e.g. Smalltalk — so these ideas did not seem to add anything important to what this research community had already done.

I most definitely agree that being able to do “requirements” qualitatively better — and as part of the intrinsic design process —is one of the most dominant needs in software engineering today.

What does Alan Kay think about the following research proposal notes about a new software world? (https://osoco.es/thoughts/2020/06/notes-about-a-new-software-world/)

We definitely need a new software world …

I think I can give almost the same answers to both of your recent questions.

Two heuristics I’ve used for more than 50 years now when thinking about “something major”, are first, to try to “write 10 things” about it, and then to “write 20 examples”, ranging from “simple things that must be simple”, “major things that should be as simple as possible”, things that are difficult that will have to be made, etc.

Then there is a basis for discussion and further comment.

What does Alan Kay think about the development of software libraries adopting Seymour Papert's constructivism and "microworlds" concepts as described in these articles?

“Sure!”

I think I can give almost the same answers to both of your recent questions.

Two heuristics I’ve used for more than 50 years now when thinking about “something major”, are first, to try to “write 10 things” about it, and then to “write 20 examples”, ranging from “simple things that must be simple”, “major things that should be as simple as possible”, things that are difficult that will have to be made, etc.

Then there is a basis for discussion and further comment.

What does Alan Kay think of the "programming as interaction" perspective? (http://tomasp.net/blog/2018/programming-interaction/)

As the author of the blog — Tomas Petricek — mentions, Smalltalk 50 years ago had a very large number of the things he wants to do (and still does).

However, it is possible that he is not aware of earlier — and even larger along some dimensions — work by Warren Teitelman in Lisp at MIT, BBN, and then at Parc. I think he would have mentioned it if he was aware of this.

For example, you can get Warren’s (1965?) “Pilot” PhD thesis from MIT’s CSAIL historical collection.

What will be the potential of a Smalltalk system based on declarative programming instead of imperative?

Could it then be called a “Smalltalk”?

I think the key phrase here is “based on”, which is used instead of “including” (which it already does).

Still, I think that having a really well tooled and hefty subset of a language be “declarative” would be beneficial in many cases.

There are tradeoffs in all styles of programming, and the actual worth of these often relies on the goodness of the debugger. For example, much declarative programming has some of the properties and feel of a simultaneous equations solver (in fact this is a subset of declarative programming).

When the solvers can actually solve the system, it is like beneficial magic from the gods.

When the solvers can’t solve the system, they often can’t tell just why.

A few simultaneous relations are usually easy to think of when we try to program this way — but it is often difficult to get some kinds of dynamic systems to be synthesized from constraints (we may be wrong on the constraints, or on how they interact, or both).

Sometimes it is both necessary and difficult to provide “advice” to solvers …

What does Alan Kay think of a new Smalltalk system more focused on the problem than in the solution space (e.g., using the ideas proposed by Michael Jackson or David Harel)?

Any highly extensible language can be made into what was called a “Problem Oriented Language” (POL) in the 60s (and many of the languages back then were in fact POLs — even Algol).

Most languages can be programmed to exhibit any semantics, so I think the issues are the bread of the sandwich: (a) are the syntactical forms that can be made good enough to express the POL ideas, and (b) can the pragmatic levels needed be achieved?

Some of the variants of Smalltalk-80 can handle the pragmatics (because they include code for the actual VM and run-time system). The syntax scheme that Dan Ingalls worked out is a pretty ingenious: it is both simple and quite readable, but perhaps not extensible enough by itself for some POLs.

However, the version of Squeak Smalltalk that we used in the naughts had a very nice implementation of Alex Warth’s OMeta as part of its live system, and this coupled with the late binding and live turtles all the way down of Smalltalk, would allow a wide range of POLs to be done (we did a variety for the STEPS project).

Are there other "it's turtles all the way down" programming languages apart from the Lisps and Smalltalks?

I’m not sure exactly what you mean by “turtles all the way down” here. It is sometimes used with regard to Smalltalk because every kind of entity in it is a Smalltalk object.

Using this as a guide, we could observe that in certain languages centered around “procedures and data structures” every entity could be some kind of procedure or data structure. Lisp would be one of these because its notion of “procedure” is strong enough to model control structures such as “if” expressions.

Another way to think about “turtles all the way down” has to do with implementation. Very early in Lisp development appeared “LAP” (Lisp Assembly Program) and a Lisp Compiler written in Lisp which translated into LAP which assembled into machine code.

The first Smalltalk had a way to generate machine codes, but this was not used for implementing the entire language at that time. But there were very interesting and useful self-bootstrapping models built-in to later Smalltalks, and eventually complete routes to bare hardware — the work of Tim Rowledge is particularly noteworth and should be looked at, as is the work of Dan Ingalls and John Maloney on Squeak.

Semantically, it is generally easy to represent dynamic languages in themselves, the largest amounts of work have to do with optimizations. For example, both Lisps and Smalltalks would often have a route that translated into C — as a kind of universal assembler — that both would be found on most CPUs and also would often be optimized. The advent of JITs — Peter Deutsche’s early one for Smalltalk should be noted — along with lots of extra memory that would conveniently hold unwound machine codes — made a huge difference.

At Parc — where we had microcodable hardware — we could tailor the HW to be “ideal machines” for VHLL languages. Here both senses of “turtles all the way down” were much more natural to do.

I have left out the “Turing Turtle” idea: that most languages can be used to make a scratch interpreter for themselves — or any other language.

What happened to the children exposed to Smalltalk at an early age in Xerox PARC, back in the 1970s-1980s? Did some of them grow to become great programmers/scientists. Do they use Smalltalk to this day? Or was there just no follow-up study?

image

There were no follow-up studies (we couldn’t even get funding from either Xerox or NSF to do the longitudinal learning experiments that we thought were crucial to the research).

And anecdotes are fun, but not much more than that. For example, the Goldeen sisters, Marian (started at age 12) and Rachel (at age 5) wound up going into math and science in college and then found their way to Apple as advanced system programmers, where I think they still are today.

Bruce Horn (around age 12) did many interesting projects at Parc, including the microcode for the Smalltalk virtual machine for the Dorado when he was 16–17, and then was one of the two designers and implementers of the Finder for the Mac while still a teen-ager. He eventually got a PhD in CS from CMU and has created several important programming languages, etc.

Etc.

Lawson English made a good point in a comment which I’ll reproduce here along with my reply:

Of course, a big question is: how were these kids selected? Just living in that part of the country gives kids a huge leg up over living in another part of the country.

Reply: This is why I didn’t make any special claims. In any case, our (Adele’s and my) model was that — in order to make any interesting claims — we really needed to show transfer via a serious longitudinal experiment that would take quite a few years. This didn’t happen.

Adele came up with a great and simple process for teaching the kids Smalltalk, and this was pretty much universally successful.

We worked with quite a few different groups, including two theoretical extremes (a) a “disadvantaged” group from East Palo Alto, and (b) a “super IQ” group selected by the school they were in.

Bottom line — and I think I mentioned this in the history I wrote — there was very little difference in the latency for understanding the different ideas. That was very interesting. The difference with the “high IQ” group was not less latency, but that they could really generate ideas of what to do next — they were very “live”: we rarely had to make suggestions.

A more interesting situation years later was with an entire busing school in LA with hundreds of children (the rules were that the school had to have the same racial and ethnic balance as LA itself, and the children were selected by lot). However, the principal had made this school — in very humble temp buildings on the campus of a regular school — into the best school I have ever seen.

So despite the random wide range of origins, the children were in fantastic mental and motivational shape. Going in we pledged ourselves a version of the Hippocratic Oath “Above all do no harm”.

This was an illustration that “a great adult” such as the principal — Roberta Blatt — can create a great environment not just for the children, but also for the parents of the children in their homes. (As I try to say in almost every talk I give on almost any subject: to generally improve things for both children and adults, you have to generally improve the richness of the culture around them — trying to stuff in little snippets of knowledge in isolation, only works for a tiny percentage of children and adults.)

Another great exemplar was an LA high school teacher I became very friendly with — Jaime Escalante (the “Stand And Deliver” guy). He was really fantastic (and even better in real life than the great portrayal of him by Edward James Olmos). And the real story was even better than the movie (which is a little smudgy about why he was so successful). Jaime Escalante - Wikipedia

He wound up being so successful that he was pretty much driven to resign from Garfield High School. Given how things actually work in the public education systems in the US, this is not at all surprising.

Why do many projects done by computer researchers (eg. Bret Victor or the VPRI) remain closed-source, even though papers get published about them?

Because many kinds of experimentation are aimed at finding out things, not to produce tools for others to use. This is especially true in the user interface area, where a large number of different experiments are required with different kinds of users, and there is often never a complete system actually built.

The results of these experiments can be summarized in both writings and videoed talks showing demos. There is almost always enough public information available to allow replication of the experiments.

And, in most cases, every few years, something major and public is released, or put on public view.

A question is often asked, “well why not make the code public anyway?” In practice this is generally completely impractical for many reasons, including just dealing with detailed questions and bug complaints about many obsolete versions, and that the kind of research that e.g. VPRI did is not at all incremental. Quite of bit of the work of this group over the years has been to start from scratch as often as possible. For example, there were 5 complete Smalltalk systems done at Xerox Parc at about 2 year intervals — each one tried different design ideas and required major implementations.

Squeak Smalltalk used the “Blue Book” definitions (which were public and open), but did a completely new “bottom” (which was also made public and open). The tooling needed for others to be able to use the open sources was done carefully enough so this was successful.

The “Etoys” system for children was done in Squeak and also made open source. The Croquet system was done in Squeak and one of the spinoffs was “Open Cobalt” a non-profit open foundation.

Basically, seasoned researchers, especially on the fringes of knowledge, have to decide how to use their limited amount of time on the planet to best advance the research, while still holding to the scientific ideals of sharing what is powerful and interesting about the results. I think Bret Victor, VPRI and many other deep research efforts have done well at finding good balances for this.

What would be the necessary elements for setting up a biomedical equivalent to PARC?

Before I write too much, I need to know if you are familiar with Janelia Labs (near Dulles Airport in Virginia).

https://en.wikipedia.org/wiki/Janelia_Research_Campus

Or any of the other research organizations that Sidney Brenner helped set up … ?

He had a sure sense of “how to go about things”. He was one of my heroes when I was in molecular biology (briefly before getting seduced by computing), and years later I had a very long chat with him in 2004 when we both happened to be at the Kyoto Prize festivities.

Why were nearly all the cases of early computers made of plastics?

“Early” as in “actually early”? As in 1945 to the 1970s? Or “early” as in “relative to your age”?

“Actually early” were basically metal as far as I know and mostly by direct experience. The Control Data 3600 (ca 1963) on which I spent a few years, had a very beautiful greenish glass and metal case ….

image

What should be learned to deeply understand the 20th and 21st centuries?

Eric Naughton’s answer is very much pointed in a fruitful direction.

A more general observation is that most human beings since our species started prowling the planet have been content with made up stories about themselves and their kind. This is partly because we are genetically wired for being part of a culture — we very likely co-evolved via the cultures that existed before us — and a large part of this has to do with how we use language and the stories we can fashion from it to provide what seem to be explanations about our situation.

The heart of the paragraph above is Anthropology, some of whose branches are many different perspectives on psychology, social behaviors, and most recently “bio-behavior”, neuroethology, and most recently: “behavioral economics”.

One of the most important processes we do is the opposite of how we generally think about perceiving the world: we instead project our beliefs onto the world as part of what we think of as perception. This can lead to real trouble, and has many times over the millennia.

There are several hundred additional serious thinking glitches that have been identified, all of which indicate that it is a full time job to hold onto an even barely accurate view of the world as revealed by the sciences. To pick just one of the 200, consider how blind we are almost all scalings …

As with careful study and modeling of the heavens overturning our story-view that the rest of the universe revolves around us, the more recent studies of our species have overturned most of the myths we’ve made up about ourselves. But most people still adhere to their local myths, in part because the study of our species is not only not taught to children, in many cases it is against the law.

The genetics that we have are not much changed from 100,000 years ago and some of this helped us survive, and what was against progress was not negative enough to do us in (e.g. anger expressed with a rock or a spear would not kill a village). Once scaling started to happen and genetics did not change to match up with it, even the scaling of one person to a band of warriors could kill a whole village and larger scalings of organization and technologies could kill whole territories.

If the worldviews of people about their situation and about “the other” — their stories treated as “normal” and as “reality” — are allowed to remain as they were during hunting and gathering small tribe days, but now in a world of vast powers, reach, etc., then we have reached a point where much of what helped us 100,000 years ago now mostly hurts us, and what was against progress back then is terribly dangerous today.

These thoughts are not at all new — but they are generally still foreign to most people today. In the study of humans, there is little that we do today that is at all surprising in the light of what is known — but it is still done, and is still surprising to most.

On a slightly smaller scale we can see that the tragic responses to the pandemic — which has a contagion and prevention pattern that was completely understand long before COVID-19 happened — shows just how little of what is actually known about us and the world we live in has been successfully taught to and learned by most humans over the last few hundreds of years.

The climate crisis was identified in 1963, has been verified many times, is now an actual war in which invasions are continuously happening, and yet the story world of most people can still not allow what’s actually going on to be thought about. Schools like to “teach to the test” — a kind of stupid idea — but the real tests are horrific disasters that require very different kinds of thinking than our bare genetics endows us with. Unless we can learn the “uncommonsenses” of the sciences and the ways of thinking that have to be learned to deal with these ideas, humanity will flunk the real tests (and the grade given out will be quite a bit worse than an “F”).

When you say, "the computer revolution hasn't happened yet" do you mean that in the sense that, "the printing press hasn't been invented yet" or that, "the printing press has been invented but it will take time before a, "literary culture" emerges?

I’ve done a number of talks and written a few essays on this subject — these will give a more expansive answer.

I usually invoke a number of “thinkers about writing and printing” such as Marshall McLuhan, Frank Smith, etc., to set up what I think is a useful analogy. Four of the big ideas about writing (and reading) are (a) that many writing systems are powerful enough to be able to express important ideas in ways that others can learn to read, understand and think about, (b) that these expressions of ideas can travel far and live longer than their writers, (c) that the fluent learning of reading and writing causes important qualitative changes in human mentalities, and (d) these changes in turn raise the quality of thinking and ideas.

As Frank Smith pointed out, literacy is thus not just about reading and writing but about having a literature, and Marshall McLuhan somewhat cryptically but powerfully explained that the power of any medium for communication is what you have to become in order to fluently use it.

Writing via hand copies made a difference in “civilizations” but not for most of the members. The printing press spread many of the same plus new ideas around in profusion. As McLuhan pointed out, the initial subject matter of any new medium is attempts to reuse the old: the two best selling books in England after the press for almost a century were the Bible and an astrology book. But in the 17th century, about 150 years after the press was introduced, very new thoughts at a large scale started to be written, distributed and read. A good tome about this is “The Printing Press As An Agent Of Change” by Elizabeth Eisenstein.

Two who saw early what printed writing was going to likely be about were Erasmus and his friend the Venetian printer and writer Aldus Manutius. They realized that books would be inexpensive enough to be portable, and Aldus determined one of the sizes of books we have today by measuring saddle-bags in Venice and printing books that would fit in them (and inventing smaller fonts that would be very readable, etc). They also realized that printed books would allow more involved arguments, and that you could refer to what you had said and were going to say if you had page numbers in books (the Jewish culture of the annotated Talmuds had already invented these for the very same purpose.

It was the gradual change in the nature and the subject matter of argument that brought forth the 17th century (and our modern world). This was the “printing revolution”.

Now, when something like a computer comes along, those who have outlooks like Erasmus and Aldus — especially if they know about these two and the history of ideas — will immediately see that one way to look at it is as “a book that can read and write itself”, and that its dynamic nature is well suited to simulation of models, and that this will permit “argumentation of a very new sort” about “very important ideas” that will likely “lead to new ideas that can only be represented and reasoned about if you have a computer”.

My research community was ARPA-IPTO in the 60s (to which Xerox Parc was added in the 70s), and many in it “had the outlooks of Erasmus and Aldus”, and the community as a whole did. McLuhan was not unknown, and it was realized that general humanity would likely react to the computer as he had pointed out about previous media: by trying to install the old content in it and to not look for the new.

My research community wanted to invent much better ways for humans to think that would go far beyond what came forth to that point. And I think we made a good start at it, including trying to get the educational institutions to pay attention to the differences that were likely to come about. This was especially true at Parc, where we invented and made whole systems partly to show much more than demos “to avoid 100 years of bumbling with more convenient forms of old media as the main focus”.

The 1980s and 90s saw the commercialization of many of the ARPA-Parc ideas, and these almost entirely went as McLuhan would have predicted rather than the way we had hoped. To the point that the “qualitatively new forms of argument” and the “new literacies” are almost completely confined to science, and engineering (especially the former).

For example, the first good models of an important idea that really needed a computer to represent dynamically to argue about concerned the rising CO2 concentrations in the atmosphere and what these might mean. The earliest warnings backed up by good science and needed computations were issued in 1963, and were heeded by many in the scientific communities who could see how powerful the combination of science and computation could be.

This has been essentially completely lost to the public to this day, and unfortunately including almost all politicians and other “people with power”.

On the other hand, today billions of people in the world do have personal computers (including smartphones) that are millions of times more powerful than needed to “argue about climate, pandemics, and many other critical issues and ideas”, but instead these computers are used as “conveniences for the older media that people have gotten used to and will buy”. This is very like the analogy to “the Bible and astrology” after the invention of the press.

This is why I gave talks and wrote essays about “The computer revolution hasn’t happened yet”. The lag in uptake with the real fruits of the press is quite understandable: there was nothing really to use as an analogy. But not to be able to look at the computer — given the analogies to the past — can be thought of as one of many massive failures of public education, to which can be added the failures regarding the climate and the much simpler to understand dynamics of contagious incurable diseases, and what should be even easier to learn about: the dynamics of scaling itself.

How did they program the first computer knowing that programming a computer requires a computer?

There’s writing the program, and there’s getting the program into the computer in a form that will run it.

In “the old days” — not just in antiquity, but well into the 60s — most programs were written on paper before being punched on cards or typed into into a file. Then a program in the computer would read a card or a file, and start converting it into machine code of some kind.

To get started from “dead” there was usually some way to enter a short program in binary by hand. This would then read the first card or record on a file or paper tape.

For example, here is an example in another answer I did for Quora:

https://www.quora.com/How-were-computer-programs-and-languages-like-COBOL-installed-into-computers-back-in-the-50s-before-software-1/answer/Alan-Kay-11

How much about physics, mechanics, and electronics should a computer scientist know about?

I think it’s a very good idea for computerists to have done the work to attain deep fluency in a well established scientific discipline (Physics is a good one). Most people in computing use the “science” in “computer science” more like “library science” than in any modern sense of what “real science” is all about. The lack of this understanding has really hurt progress in computing since the big explosion of the field starting in the 80s.

Similar remarks can be made about the “engineering” in “software engineering”. Most people who claim they are doing this have very little idea about the levels that “real engineering” goes to in the established engineering disciplines. (And these days you can’t be a “real engineer” without gaining considerable fluency in “real science”.)

How often does Alan Kay find that the real problem is found at a level of abstraction that can’t be addressed by individuals, such as ‘the human condition’ or ‘societal values’?

I’m not sure about all of the question, so please comment to explain more.

Generally speaking the “real problem” is almost always “the human condition” and/or “societal values”.

But these problems can be addressed by both individuals and groups — for example, in trying to improve levels of learning and “enlightenment” in the general population and also in those who have gotten power in one way or another.

Were all Wirth languages Algol-like (e.g. Pascal), or were any of them Lisp-like, or even something completely different?

A fun question!

Adriaan Van Wijngaarden was one of the earliest and most colorful computer scientists concerned with programming language design. He was on the Algol committees, and was a bit of an agitator, and especially in the directions of generalization and simplicity. John McCarthy was also on the committee and contributed many ideas, many of which were in Lisp, but only a few made it (such as conditional expressions).

A landmark 1962 paper of van Wijngaarden’s — “A Generalization of Algol” — showed how you could retain the semantic powers of Algol while making the syntax vastly simpler. Much of this was essentially moving Algol towards Lisp. For example, many special constructs in Algol, could be rewritten as functions — and even in Algol, because it had “call by name” which delayed evaluation of an expression and retained the environment the expression would need when evaluated. So, the conditionals and looping constructs in Algol could be replaced, etc.

This paper influenced Wirth when he was a grad student, and his first effort at Berkeley was “A Generalization of Algol” (CACM 1963) — not as general as Van Wijngaarden’s, but “Wirthian” in erring on behalf of the implementor rather than the programmer.

Wirth’s next effort (with Helmut Weber) was to try to make a complete higher level language generator whose result would have the power of Algol (or more). This was “Euler” (CACM Jan Feb 1966), and I think it was Klaus’ best work.

This was “going towards Lisp from the Algol side”. Also going on in the 60s was “Lisp 2”, which was an attempt to “take Lisp towards Algol” (this was much later done really well at the Stanford AI Project as M-Lisp (David Canfield Smith), and Tony Hearn’s REDUCE).

And also going on in the 60s was the work of Strachey, and especially Peter Landin, who showed how to really use lambda calculus to model the semantics of programming languages, and especially Algol. (These were very rich times!)

The basic idea here was that there was no reason not to have all cakes and be able to eat all of them. The trick was to deal with how expressiveness traded off against extreme generality.

The syntax “compiler-compiler” of Euler was the weakest part, but it could easily be done in other ways, and especially could be done in Euler itself.

The virtual machine for Euler was modeled on the Burroughs B5000 (and would be called “byte-coded” today). Weber got this going in the somewhat forgiving microcode of one of the smaller IBM 360s (which used microcode to emulate 1401s, etc.).

I was very excited by this — it was pretty darn clean and simple — and I used as many ideas as I could for the FLEX Machine that Ed Cheadle and I made. Flex was kind of an object-oriented Euler, with several different kinds of simple translators, written in Flex.

These techniques were later used for Smalltalk at Xerox Parc.

Klaus liked to be “practical” so he started to abandon these ideas for less wonderful ones.

But a great end to this story is that he eventually wound up visiting at Xerox Parc, saw how we dealt with “practicality” by making machines suited for higher levels, and decided to do one of these himself. This was the genesis of “Lilith” and eventually the “Oberon” system.

Should writing code be fun?

I wrote this in a comment to Ian Joyner’s answer. Here it is as an answer. Ian used a great quote by Al Perlis (the first Turing Award winner) about “Computer Science”:

"I think that it's extraordinarily important that we in computer science keep fun in computing. When it started out, it was an awful lot of fun. Of course, the paying customers got shafted every now and then, and after a while we began to take their complaints seriously. We began to feel as if we really were responsible for the successful, error-free perfect use of these machines. I don't think we are. I think we're responsible for stretching them, setting them off in new directions, and keeping fun in the house. I hope the field of computer science never loses its sense of fun." ~ Alan Perlis

Bringing Al Perlis’ comment up to the present time, where “Computer Science” is kind of a sloppy term for “pretty much everything”, instead of the very strict sense of “Science” (as in “Real Science”) that he had in mind …

Note that in the larger world of highly developed disciplines, we want the airplane we are flying in to be designed by serious engineers, whose version of the Hippocratic Oath is “The bridge must not fall, the airplane must not crash …”, rather than scientists or mathematicians who will generally learn more from a crash — and from playing around in general — than when everything goes well.

The “fun” of engineering — and boy it really is fun to design and make things that really work well and help humanity along — is very different from the “fun” of science and math. Having “fun” is very much being able to do what you choose to do really well.

When, and in what degree was today's level of abstraction in programming predicted?

If you explain just what you mean by “abstraction” here (and “level”), I will be happy to attempt an answer.

Have you ever studied optical illusions to use similar principles in your UX designs?

I have studied them quite a bit — as well as many books by research psychologists, including by Richard Gregory, Roger Shepard, etc. But not to use the principles in the UI designs per se, but as one of the most available routes to understand and test the many ways “our minds don’t work very well” — or to paraphrase Francis Bacon, “We humans have bad brains”.

I did a lot of theater in my early 20s and was also studying Anthropology as a minor concentration (or as Jerry Bruner would put it “cultural psychology”). A lot of what is done both on the stage and with music is a kind of magic act where your designs and portrayals are encouraging the audience to use their brains and their imagination on behalf of what you would like them to think they see and hear and feel.

When I started to invent and design UIs in the ARPA/Parc community, I realized that one very powerful way to make progress was to consider the human-computer relationship as “theatrical”, and to create an environment that would harness the intelligence and instincts to explore of the human user.

I called what we were designing: “the user illusion”, and I think that term is still in use.

I searched YouTube for Alan Kay with Donald Knuth and found nothing. How does Alan Kay understand the essence of literate programming? Is it to design code like you would have to teach it?

Don and I have been friends for more than 50 years now. The simple summary of “literate programming” is that “Don is amazing”, both in “many ways” and in how he goes about programming, and what he thinks a program is.

He wrote a nice short paper and example for a book about programming by Doug McElroy, that got many prominent programmers to write a program for a problem — I think it was text justification or some such — and also comment about it.

A much longer, and very illuminating example, is the “B” book for TEX.

Don thinks of the form of a program as kind of essay that has to be as comprehensible as possible to both humans and computers. The burden of this is on the human programmer, who has a set of tools to help create the composite document — this includes an “extractor” that extracts and stitches together the programming language text in a number of ways.

He has generally not been terribly interested in VHLLs, in part — I think — because a well explained MHLL is pretty darn understandable in considerable detail (at some point the “wealth” of detail starts to become a tax).

Don started programming in the late 50s (when many of us old timers did), and the algorithms and data structures styles of those days produced programs that were somewhat “narrative” in structure, and Don’s “Web” tools for “literate programming” facilitate narratives.

As one moves from procedures and data structures to systems, especially at scale, the narrative structure quickly loses much of its benefits (you can tell the story of a few people in an apartment building in New York City, but trying the describe the city requires a very different kind of approach).

There have been experiments with organizing “narrative threads” through system descriptions, none of which I’m aware of quite delivered enough value. Since we humans are organized around stories and narrative, finding ways to bridge this gap for systems would not only “be nice”, but I think it has to be a critical part of making further progress,

Is it a good way to get an indirect tip on an invention by looking at a software program running, that is morphing into different shapes? If easier to get ideas this way than language, does that prove our language is too simple and must be developed?

I don’t quite understand the first part of your question, but as to the second part, there is quite a bit of evidence that we have what one of the inventors of cognitive psychology — Jerome Bruner — called “multiple ways of learning and knowing and remembering and thinking”.

He picked three of them to be deeply studied. These were

“Enactive” — how we learn and know and remember and think by “doing”, “touching”, etc.

“Iconic” — how we learn and know and remember and think via “configurations” in a kind of spatial context (includes visual and auditory).

“Symbolic” — how we learn and know and remember and think using symbols of various kinds to stand for ideas

Some of the previous work along these lines had been done by Piaget, and then by the mathematician Hadamard (in “The Psychology of Invention in the Mathematical Field”, a survey of the top mathematicians in the world ca 1945).

I used this “multiple mentalities” idea as part of the foundation of the Xerox Parc GUI design, and it has many implications for curriculum design as well (see Bruner’s books, especially those from the 60s).

I think the answer to your question in this light is that there are many dimensions on which to find ideas, and it’s a good idea to use as many as possible.

What are the differences between social science, natural science, and humanities?

Several good answers here already, so this will attempt to add more perspective.

Our genetically created brains are set up to be social, and have propensities for learning language and our surrounding culture, including stories — our dominant way to remember, communicate, and make sense of things. All this activity goes on in our minds, which are the processes created by our brains.

Even though we are only “thinking in between our ears”, our minds perform the trick of projection, which seems to put the part of our consciousness that is modeling the outside world out into the world (so we think we see and touch “what’s out there” rather than having all the action take place internally ).

This is easy to reveal e.g. through some “optical illusions”, such as “size constancy” in which comparing (say) two oranges or poker chips, one twice as far away as the other — which show up on our retina as one half the size of the other — conflicts with our knowing — meaning “believing” — that they are both the same size.

The resulting “story” that is made up from the direct phenomena and what we believe, is often a compromise — in this case we “see” the twice as far away object as about 80% the size of the closer one, rather than 50%. In other cases, we may only “see” what we believe, and not what is there at all.

One way to think about this is that we are in a kind of hallucination — a “waking dream” — even when we think we are “fully awake”. Another way to think about this is why “stories”, and especially “theater”, work with us.

And, we should be able to see why it took modern physical humans — 50,000 to 100,000 years on the planet — to finally invent real science in a way that stuck only about 400 years ago. This long delay is because we are basically satisfied with good stories as explanations, and our projections tend to make us see our stories out in the world. This is how we have existed (and how most people exist today).

The story framework is deep within us, so what has basically happened with regard to “humanities, science, social science” is a progressive “being more careful with stories” (for some stories) over the last 2500 years. Both logic and its powerful offspring mathematics, enforce much more internal consistency; in addition, science requires considerable work with learning how to gather phenomena “pre-story” in order to start to create maps into our language apparatus that will allow “careful stories” to be reasoned out. This progression also requires progressively more effort, both in learning, and in delving.

The result today is that still only a relatively small percentage of humanity starts learning how to move from the quite magical unsophisticated worlds of our genetics and the traditional cultures which have co-evolved together.

Within these efforts, all the different ways to try to think about us-and-our-thinking are extremely valuable, and not at all exclusive. The human tendency to desire “one true way to understand” does cause overshooting of the methods in each of the areas.

For example, we find over the years divisions of the humanities making claims about parts of the universe where no good methods exist, and we find this in turn also for mathematics, and for science, etc. With regard to the latter two, Einstein said this in a famous lecture in 1921: “As far as the laws of mathematics refer to reality, they are not certain; and as far as they are certain, they do not refer to reality.” In other words, quite a bit of care is still required here, especially when mathematics is being used as one of the languages to represent models in science.

A good rule of thumb in thinking is “The softer the area, the tougher you have to be”. People had to get somewhat tough to break out of the 10s of 1000s of years of story explanations, but once they did, the physical universe started to help, especially, early on in Physics, by being pretty nicely consistent with respect to careful observations, and stubbornly resistant to our own beliefs. Chemistry required a lot more work, and it took Biology all the way into the 1950s to come up with a complete enough fundamental theory and methods.

Part of the nature of the physical sciences is the line often heard in first year physics “Assume a spherical cow” (for calculating air resistance and terminal velocities, etc.). In other words, approximate methods work surprisingly well in the physical sciences, partly because much of the epistemology of science is based on admitting the existence of many kinds of errors — both in measurement and our noisy thinking — and inventing larger methods that do not allow the errors to obscure some of what nature seems to be doing. (There are important analogies here to the much simpler processes used in every communications and storage and computing system today to “embrace the errors that happen” with methods which restore the original intentions.)

Where the physical sciences try to escape from our “waking hallucination”, it is not a large exaggeration to say that the centers of the social sciences and humanities are devoted to dealing with us and our quirks directly (though most would not define themselves quite this way). The field of “bio-behavior” is currently a rather small overlap between all three areas that is now slowly starting to grow (there are other names for related fields) where the object of study is “scientifically why” we humans behave as we do.

The “softer nature of this nature” has required real toughness on the part of anthropologists, psychologies, etc., to somehow “be scientific”, partly by trying to prevent beliefs and desires about human beings to creep into “conclusions” and “results”. This is very difficult, and claims have to be treated with care. Still, this is arguably the most important area of human knowledge to try to improve and grow.

An interesting deep area of study is how emotions are intertwined with our communications and beliefs. Quite a bit of what we consider “fundamental meanings of existence” are wrapped in how we feel — beyond what we conclude — and it is an interesting question whether the gradually more sophisticated understanding of how emotions work, produce enough illumination where it is most important.

Music is a very good example. At its heart it is about feelings and alternative ways to express them. A famous quote is: “Talking about music is like dancing about architecture”. And to wind up where we started — with a story — the 19th century composer Robert Schumann was asked after playing a new piece what it was about. He thought for a second, sat down at the piano, and played it again.

Are there any areas apart from user interfaces in which the Model - View - Controller (MVC) pattern could be applied in spirit?

Sure.

I wrote a paper in the late 70s or early 80s dealing with the idea that “everything” — and more — in the kind of object-oriented programming we were doing could be re-created in very useful ways by generalizing “viewing”. This included making objects, inheritance, delegation, etc.

I don’t think I have a copy of this paper, but it was for an “Infotech State Of The Art Conference”.

It’s worth thinking about what the terms could actually mean as abstractly as you can …

Why can't programming languages be written in plain English?

My boss at the National Center for Atmospheric Research in the 60s — Glenn Lewis — in the 50s was a Princeton grad student of von Neumann’s, who — among his many personality idiosyncrasies — was famous for pooh-poohing “higher level languages”.

A great story from Glenn: one time when asked about this, von Neumann summoned a grad student and said “write me a sort program”. The grad student went off, and von Neumann said “The grad student will either come back with a sort program or he won’t come back at all — that’s ‘higher level language programming’ !”

This should also be all that is needed to figure out the answer to your question.

What does Alan Kay think about the paperclip maximizer thought experiment?

Of course, this is a very old idea in classic science fiction — that Ais, robots, etc., could destroy humans if that would help some goal of theirs.

But consider e.g. Nazi Germany as “intelligence destroying humans as part of goals considered more important”.

Or consider e.g. the climate crisis of today as “intelligence destroying humans as part of goals considered more important”.

An “intelligence” doesn’t need to be “artificial” in order to be a deep existential threat.

Why do we use digital computers for machine learning, AI applications, and natural language and speech processing, if our brains are analog not digital, would it be better if we use analog Computing to mimic our brains?

“Digital” is “Analog handled carefully”. Quite a bit of work in design and technology has to be done to make the noisy voltage levels (or any other means of transmitting a signal) be interpreted as discrete values.

A high percentage of the bits on CDROMs and DVDs are there to provide various kinds of redundancies to protect the “information bits” and themselves (the “protection bits”). This has to be combined with processing to recover the original intentions.

One of the reason for “clocks” in digital computers is to try to isolate the times when a bit can be interpreted with very low error from the times when its value is not easy to determine.

So we can take the analog world and design it to be discrete. But we can take what’s on offer in the discrete world and use it to simulate analog to levels far beyond the signal to noise of analog. This is what is done in the careful sampling of audio in both time and amplitude.

And we can also make hybrid devices that use both. For example, the first ever optical mouse was done by Dick Lyon at Parc using a custom chip he designed and made which had an analog section for rapid computing where perfect accuracy was not important, and it had a digital section where things needed to be more nailed down (I think there are some Machine Learning chips that use analog values because the ML schemes are correlative and don’t require perfect accuracy).

What was the first computer with a graphical user interface?

It is really annoying to have “someone” change a question after it has been answered — especially if the answer no longer fits the new question. That is what happened here. The original question asked about “personal computers” and my answer is below. Now the question says “first computer”.

I take a GUI to include being able to point at the screen for both manipulating graphics and to issue commands/requests.

Sketchpad in 1962 definitely had a GUI, and it was implemented on the TX-2 computer at Lincoln Labs (this was a very large research computing for the SAGE system). Most commands were still given using separate physical buttons.

Graphical outputs were possible — and used for games — on both Manchester machines and on the EDSAC, all from either side of 1950. But I’m pretty sure that they didn’t have pointing devices.

The “light gun” — for pointing — was invented by Bob Everett for the MIT Whirlwind computer in the early 50s, and the combination of line drawing CRT and light gun was used in “Whirlwind II” — which was the SAGE system — and there it was used to point to displays on the screen to indicate which plane was of interest. I think that this was the use of pointing for a parameter, but that the commands were entirely given by physical buttons.

I don’t know of the extent of possible GUIs on the earlier Whirlwind computer — it’s possible: they did many other interesting first things with interaction.

So: certainly Sketchpad in 1962 on TX-2.

---

That’s a good question (some people require the “personal computer” to have been commercially available).

But I think this is like requiring Engelbart or the RAND corporation to have been in business to have been important parts of the history of GUIs.

There were “graphical user interfaces” before the Parc Alto, but they were on much larger machines — the experience was “personal computing” but there was not a “personal computer”.

The Xerox Parc Alto was a personal computer, and it was available outside of Xerox. Several thousand were made and quite a few were donated to universities and to the US government.

If we think of “GUI” as meaning overlapping windows, icons, pointing devices, bit-mapped screens, etc., then I think choosing the “Xerox Parc Alto” as the answer to the question would be quite accurate.

I heard today that there are some advantages of using a formal language to write a software specification. What are they?

All computer languages are “formal languages”, so we want something that will allow us to express the meaning of what we want without pragmatics (unless part of the meaning has requirements for certain pragmatics). We’d like it to be as clear as possible, and we’d like to be able to debug it as well (a spec that is not debuggable might not actually represent what is desired).

For an example, let’s look at sorting, and use a formal language that is like Prolog with a more inutitive syntax. Sorting can be defined as:

sort A is B if B is-permutation-of A andif B is-ordered.

Let’s suppose our language already has definitions for “permutation” and “ordered”. In a Prolog-like language this will both check A and B, and also will try to generate B if it is undefined (this is not pretty, but it will do it).

So, our little specification language is both providing a definition, and a semantic check on what we intend by “sort”, and it is also providing a running debuggable simulation of “sort” that can be used in a larger spec/prototype.

(Note that the makers of the web and Quora have decided that I can’t run and test this …)

The next steps would be to — separately — start writing “sorting cases” for the many different ways we know how to do sorting. These cases could be headed by the conditions that would select which method to use, e.g. given the size of a particular set, and/or the longevity of the set, or the frequency of resorting, etc.

We can see that a really good “spec”/requirements” language would make a big positive difference if we could get a running prototype out of it. This might require a supercomputer, but this is standard practice in other branches of engineering to do as much work with the semantics of models to get the design as intended.

In practice, not enough smart people have worked on this problem over the years, and the funders never got interested enough in it. I think it should be one of the “grand challenges” of the next few years in Software Engineering.

Were Xerox PARC systems like the Star robust for third-party application development?

As the question is phrased, the answer would be “yes”.

However, at least in the early and main phases of the Star (and I’m guessing: all) there was no intention that any other than Xerox would do applications for the Star.

The robustness came from the ways the Star, its OS, its UI and its programming language were designed and built. In a word: “principled”.

My research group and I were not a part of the Star development (although the UI design “notions” for Star were first suggested and prototyped in Smalltalk by David Canfield Smith). I was friends with many of the Star designers, and got to witness various stages of this process.

(I should mention here that I didn’t agree with the general approach that was taken for the Star — both HW and SW — but since I was mostly interested in getting on to personal computers for everyone, especially children, I didn’t have an emotional stake in the Star.)

But the deep — and mostly successful — attempt to be “principled” about everything was interesting to watch, and it produced some really thoughtfully worked out documents.

I thought the most effective and important ones were about the Star UI, and some of these might be available. There is a good retrospective document by Bill Verplank that is worth reading.

I don’t think there was any connection, but the later “Macintosh User Interface Guidelines” document — mostly by Chris Espinosa (I think) — was a really good set of principles about the Mac UI and what developers must do to create a “MacApp” (and there was a development system of the same name that helped developers make UIs for their products that would be in accord with the guidelines). Chris Espinosa was also a great bulldog and knee shooter — a kind of hero, actually — who made an enormous difference to how Macs looked and felt in their early days.

Back to the question, the Star OS, Language, and IDE were all heavy duty designs done by experts, see:

https://en.wikipedia.org/wiki/Xerox_Star

There were several software “religions” (or at least belief systems or “paradigms”) at Parc, and the Star system was done using a different one than the one my research group had been evolving. I thought then (and now) that our lighter weight, much smaller, “fast enough”, and still very safe approach would have worked out better. For example, if you wind up swapping pages much of the time because your system wound up “large”, it doesn’t matter how fast your inner loops are …

But we will never know.

How were multi-user programs (e.g., MUDs) implemented on time-share systems? Shared memory? IPC?

Every which way …

In the sixties there was an attempt to protect large processes from each other using base-bounds registers (which were quite often merged with the hardware used for mapping swappable portions of a virtual memory).

For example, on the SDS-940 that Engelbart’s group used for “the mother of all demos” the “Genie OS” and the hardware mapped 64K 24-bit word virtual memory for each process as pages of 2K words each. This OS did have a pretty good process-to-process message scheme, so processes could communicate. But also many kinds of unprotected threading was also used within these protection boundaries (they did everything that could possible help their goals — and they were good enough to survive the risks they took).

Basically, most systems were a combination of “good” and “bad” ways to do things.

An interesting exception — and a very early one — was the Burroughs B5000, which used very different schemes in HW and SW from all the rest, and “in general” every part of their systems was much cleaner, safer, more efficient, etc.

Finally, certain programming language/systems, including several at Parc, were designed successfully to be highly safe, and thus were able to do in software (usually microcode and the VHLLs) what normally would require specific HW to accomplish.

Does abstraction in computer science use mathematics as a basis?

“Mathematics” doesn’t have a crisp definition, and its plural form has admitted many new thinking schemes over the years.

I think we are doing mathematics whenever we are making a concerted effort to relate organizations of ideas to each other (von Neumann called mathematics “relationships about relationships”).

From this point of view “abstractions in computer science” are a form of mathematics.

If “mathematics” in the question is supposed to mean “already existing forms of mathematics”, then the answer is also yes — and ranges from the use of logic*, to schemes to represent numbers and operations, correspondences that are both functional, “functional for a particular time”, and “quasi-functional”, etc., to systems organizations and notions about process.

---

When I was learning “mathematics” more than 70 years ago, many logicians held that Logic was a separate field and earlier efforts in the century had been made to “get Mathematics from Logic”. This division could still be the case (but in my view they are all part of the same set of ideas and ways to approach things).

How does the brain interpret computer languages?

There is not much definitive understanding about how the brain does most things, including traditional languages, mathematical languages, music, computer languages, etc.

It’s worth noting that virtually all the studies of fluency in any of these have found that what the brain/mind knows is strongly projected onto sense impressions (a) in order to recognize them at all, and (b) to make many instant predictions about what else is likely to be true. For example, when a numerate person is shown a number, they don’t just recognize it as a number, or see what the number signifies, but they also “project out” the number into many perspectives, including magnitudes, possible factorings, relationships with other numbers, etc. This is also the case with music, and with computer languages.

In all cases, the “projecting out” that manifests “possibilities” is a critical part of how we understand.

In a logic system, this would be called “forward inferencing” or “aggressive inferencing”. It puts many more things in play than were in the the original stimuli and offered knowledge.

. Do you agree with Myers and Rosson's statement that 'nearly half of entire software development effort relates to the user interface'? Defend your answer.

This sounds like a homework assignment.

If you were to read their paper, you’d see that it was written 30 years ago in 1992. Practically the first thing that Brad Myers complains about is that the previous survey dated back to 1978, and there was a need for one that was up to date!

And to give you more of an idea about then vs today, he said they got quite a wide response to their survey (it was all of about 70 responses!)

I think this is an excellent paper, but for the purpose of the question, you need to find a much more recent claim so you can ask the question more accurately.

When Alan Kay invented the user interface (screwdriver metaphor), had he in mind disabled people too?

I didn’t invent the user interface — or the graphical user interface — many of the basic ideas were first invented and implemented in the 50s, and especially in the first half of the 60s — and quite a few of these were done by DoD supported research, especially by ARPA.

Even the early ideas had considerable range, including ideas that were like “hand-tools” and those that were like “people as helpers”. These included the use of AI in various ways. John McCarthy’s “Advice Taker” in 1958 was an AI “interface agent” that could interact in commonsense terms. The notion of “an Oliver” (named for Oliver Selfridge) as a kind of super Siri was a mid-60s scenario. Hands free UIs were part of the larger design space.

An early Braille UI for a disabled person was for Ted Glazer at MIT, who was physically blind but mentally had great sight (he designed the displays for DEC and was a power in programming languages back then).

My father was a physiologist whose central interest was both the design and psychology of prosthetics, first for veterans, and then for Thalidomide babies. This was a background theme for me growing up.

“Personal computing” was a notion within ARPA that predated “personal computers”. I got interested in the latter because of the possibilities for rich and quick interactions, and — as a musician — started to use playing as a metaphor: “a computer is an instrument whose music is ideas”.

I was working on a desktop personal computer design in the last part of the 60s when I met Seymour Papert and saw what he was doing with children and “powerful ideas”. This flipped my perspective about most things, and I started to think about “children’s computing” and children’s computers and user interfaces.

One of the flips was from “tools” and “vehicles” and “agents” to “media” as a larger way to think of the actual powers that were needed and furnished. Another flip was from a UI metaphore of “access to function and power” (like the control room of a nuclear submarine) to “understanding and learning” (maybe more like a Montessori school).

The Parc GUI came out of those and other flips.

We also were able to get one of the truly great eye-trackers of the day (done at SRI) and used it for many experiments. Sam Gebner at MIT (a student of Minksky’s) had done a terrific thesis using eye-tracking for many I/O tasks, including pointing, and we all expected that GUIs would include eye-tracking. Nicholas Negroponte was another strong proponent of this.

Today, it is really a shame — and worse — that with tiny cameras in so many personal computers and phones, very little has been done to do eye- and pupil dilation- tracking. One corneal reflection (using IR) is good enough for about 2°–3°, and this is about the size of a word on a screen.

However, the general art of UI itself has diminished gradually and now greatly as the commercialization of personal computing “progressed”, starting in the 80s.

How do computers work? I am 16 and I love programming, but I feel like I should know what really happens on the inside, the physics of it.

I’ve answered various parts of this question elsewhere in Quora. Try this one to get started. You will see that the fundamental building blocks for making a computer can be made in many ways, and often from completely common materials.

Alan Kay's answer to How do computers work the way they do? When does electricity become executable logic and how?

Check back here every few days for a few more references that will help.

How do computers work the way they do? When does electricity become executable logic and how?

Many of the answers here have more elements than needed — for example, transistors and how they work are not required.

One way to think about digital programmable computers is that they can be completely made from “switchable switches”, that is: switches that can be switched by signals to produce signals that can switch switches.

Electricity is not needed. For example, switchable switches — and thus a programmable computer — can be made from tinkertoys, ropes and pulleys, water or gas flow, etc.

image

Tinkertoy logic. Signals are “push” or “no-push” of a rod.

The top construction will transmit a “push” if “either or both” of “INPUT A” or “INPUT B” are a “push” — this is the logical operation “OR”.

The bottom construction will push if the input is NOT a push, and will NOT push if the input is a push. This is the logical operation “NOT”.

These two operations are sufficient to make any computer. For example, here is a TIC TAC TOE playing tinkertoy computer made by Danny Hillis and Brian Silverman.

image

Electricity is convenient in many ways. One way to get electricity to cause something to happen is by making electromagnets (a battery, nail, and insulated wire are all that is needed).

Making an Electromagnet. Up to a point, a battery can be thought of something like a canister vacuum cleaner that produces high pressure at one end (the top) and low pressure at the bottom. Wires can be thought of as tubes. Charge can be thought of like particles. The tube on top is at pressure, and the particles will flow to the low pressure tube on the bottom if a tube is used to connect them. This flow can cause work to happen.

image

There’s more to electricity than this, but much can be visualized just by keeping the flow of charge in mind. “Voltage” (or “tension”) is analogous to “pressure”, and “current” is analogous to particle flow.

image

When charge is able to flow through a wire, an electromagnetic field is produced, and this can be intensified by a magnetic metal such as an iron nail.

We can make a switchable switch by having an electromagnet pull on a physical switch to turn on or off current in a further set of wires.

image

This all that is needed. Note that many things can be done just with these ingredients, and especially, all that is necessary to make any computer.

Here, the output is the same as the input, and this “relay”ing of the signal (using a fresh source of current) was how this was originally used in telegraphs in the 19th century.

But we could organize the spring the other way, and this would send the “negation” (the opposite, the NOT) of the input. (Below we’ll use NOT in a very interesting way.)

We could organize two of the bottom elements so both electomagnets need to pull their switches down to transmit a signal — this would be the AND logic operation.

And, on the right is shown how to organize things so that either OR both inputs will turn on the light.

Do you see how these work?

image

How to make the AND and OR operations using electricity and relays.

A good way to think about this kind of “active logic” is that all the operations are comparisons of the inputs, and the “logic operation result” is delayed a bit from the comparison.

Next is an example that uses “NOT” to make a door buzzer (and if we squint at it, we can also see that this is a ticking clock). This is worth pondering for a bit.

The left side shows the wiring and the state with the button/switch “open”/”off”/”up”.

On the right side we see what happens with the button “closed”/”on”/”down”. The action of the electromagnet pulls the switch open/off which interrupts the current, this lets the spring pull the switch back closed/on which gets the electromagnet to pull the switch open again.

We will hear a buzz. The light will be flashing on and off. And the output will be a train of on-off signals (that we can use as a “clock” in making our computer).

image

Note that this “logic” is paradoxical in standard mathematical logic

x = NOT x

but makes sense if we add the time delay needed in the physical world

x(t+1) = NOT x(t)

In fact, on a computer, it is this operation that defines time for the computer, and what “before” and “after” mean.

The combination of NOT and AND can make the NAND operation, and can be a universal building block for any computer.

Because the logic is independent of the materials, most computer designs are expressed just using logic symbols.

Some early computers were made from relays, and they were pretty reliable — but were bulky, and required a fair amount of power. Vacuum tubes could do the same switching operation and were a lot faster, but had their own idiosyncrasies. Transistors were smaller and low powered, and especially could be fabricated as circuits on silicon wafers.

If you are interested in this, there are a host of books and other online material, including interactive building kits.

What would happen if a computer was made for two users, with a monitor in common and two keyboards? Would it make programming more sociable?

To just pick two gems from the past, take a look at Doug Engelbart and NLS, and Horishi Ishii, especially his work at NTT and “Clearpane”.

If advanced software is technically just made up of lots of really simple components, why can't we replicate them using AI or no-code platforms?

Living things are made mostly from 6 simple atoms plus a few trace elements. What’s “interesting” is the unusual organization of them. This is also the main difficulty with software, whether written by humans or other software (i.e. “AI”).

How is an Operating System run on Transistors? Like how is software put on it and getting a graphical output from all of this?

I’ve answered much of this question already on Quora. I’ve put a few of these links below.

The way to start understanding this is to first see - A - how patterns can be represented by various kinds of “marks”, including in electrical devices such as switches, relays, etc. You don’t need transistors for this (they are just convenient).

Second, is to see - B - how some electrical devices can receive signals, compare them, and do one thing or another depending on the signal.

These two ideas can be combined into a system that has an “A Part” that will hold marks that represent both “materials” (such as numbers) and “directions” (called programs) that can be interpreted by a “B Part” that will manipulate the “A Unit” (including both the “materials” and the “directions”.

An “operating system” is just a collection of general programs that are written to help programmers write new programs. “Graphical output” is represented using exactly the same kinds of markings used in the “A Part”, that are generally interpreted as a color for each dot on your screen (so there are a lot of these).

For example. here is an actual “portrait” of the RAM memory of the Xerox Parc Alto. It had 65K 16 bit words (128K bytes, ~ 1 million bits). A little less than half the memory is used for the 808x606 display with one bit per pixel, and a little more than one half is used for everything else (in this case Smalltalk, which combined an operating system, gui, interactive development system, and user media and other tools, including text, drawing, animation, music, and desktop publishing, etc.)

image

The display part of the memory looks just like what is put on the screen (part of the Alto hardware scans this continuously and makes the CRT display light up where there is an “on” bit), and the rest shows somewhat random patterns characteristic of what SW looks like as bits.

Alan Kay's answer to How was the first computer operating system made if there were no programming languages?

Alan Kay's answer to How do computers deal with images, written texts, videos, etc.? Does it transform them into 0 and 1 codes?

Alan Kay's answer to How were computer programs and languages like COBOL installed into computers back in the 50s before software?

What is the revolution of computers?

Questions similar to this have been asked a few times (I’ve already answered two of them — take a look). And here’s one of the answers: Alan Kay's answer to At OOPSLA 1997 Alan Kay gave a talk titled "The computer revolution hasn't happened yet". What parts have materialized thus far and if not why not?

What is a program, and why do computers need programs?

A “computer” is any system that manipulates information, most often to help generate inferences. It was first used to refer to people who did this, especially in the case where the information was numeric and the inferences were arithmetical.

Some of the earliest added tools were in the form of tally sticks, and eventually many variations of the abacus, in which the information is organized in the form of markers of some kind that conveniently represent numbers for arithmetical manipulations.

The human part of this system is taught how to manipulate the tokens to e.g. add or subtract numbers. These added schemes of “do this, then do that, compare this to that, and if one way do this, etc.” can very well be called “programs”.

This idea of having a general piece of machinery to which “instructions” of different kinds can be added to produce different behaviors was eventually put entirely into machinery. One of the earliest reported is the programmable rolling toys of Heron of Alexandria.

This idea was used for the barrel organ and music box, where the general part could play any combination of notes in any sequence, and a program was set up to play each individual piece.

The barrel organ influenced weaving, and there were a number of programmable looms made that were controlled by a similar idea and mechanism — the most well done, and famous, was the Jacquard Loom ca 1800.

Babbage and Ada Lovelace recognized that the separate program idea — and the particular mechanism of the Jacquard Loom — could be used to program a general calculating machine he called the Analytical Engine. The idea that programs could be written to do more than arithmetical calculations appeared — for example to write music — and “the more” included being able to create programs themselves.

Many stories and years later, Turing illustrated a theoretical mechanism that was “universally general”, in that the general mechanism could follow a program on a mostly blank memory that described the operations of machines as general as itself. An important addition was to allow the machine to write symbols on the memory as well as to be able to read them.

This meant that any such general machine could simulate and carry out the operations of any machine.

The particular way this was organized — as a relatively small simple fixed mechanism — and an arbitrarily large general memory that could hold marks representing symbols that included both programs and plenty of further space for markings made by the computer — wound up being the plan for the first physical programmable computers starting in the late 1940s.

To answer your question: a “computer” doesn’t need to be programmable to be useful, but “programmable computers” greatly expand the ease of making computing mechanisms by providing (a) a fixed part that can read and follow instructions, and (b) sets of instructions — programs — that create desired behaviors when followed.

Is computer science a real science like physics or is it a Cargo Cult Science with fields like machine learning having no deep theory behind it ? Can you explain the reason why computer science is a science?

I think we have to get to more precision of terms to approach this question. For example, we need to agree at least on what we mean by “Science” and what is meant today by “Computer Science”.

However we define “Science”, we need to include fields such as “cultural anthropology” and “suspension bridges”.

I take “science” to mean the principled attempts to gather phenomena, to understand it, how it came about, and to communicate those understandings in terms of -models- that allow testing and criticism by matching them with the sources of phenomena. One of the many properties that makes science so interesting and important — besides its primary approaches and methods — is that the actual “knowledge” in science is in the form of the “negotiations” between the models and the phenomena (we can think of this as “models are what our brain/minds can represent and handle, and phenomena don’t care about us at all” — it is the relationships we can make between these that is the epistemological breakthrough of science).

For “suspension bridges”, we can see that these exude phenomena that can be studied, modeled and tested, so there is no barrier for “Bridge Sciences” that stem from the bridges made by humans. *

“Cultural anthropology” gives rise to considerable phenomena, but relatively diffuse models. This calls for deeply careful and serious “negotiations” — “the softer the field, the tougher you have to be”. Since science itself took hundreds of thousands of years to invent — and to get “deeply careful and serious” even about Physics — we can see that it is possible to “get tough” via will, and thus to get tougher still as we explore more complex systems.

Now we have to deal with “computer science” as a term. It used to mean an aspiration to be a true “science of the artificial” * The term today most often is just an idiom for practices that mostly should be grouped under “engineering”, (however rudimentary) and for which most in the field “don’t care what these terms mean” (sounds like quite a few computer programs I’ve seen!)

But I was around when the “computer science” and “software engineering” terms were coined, and have always taken them — as did the folks back then — as serious aspirations: pursuits to be grown into real fields without diluting either “science” or “engineering”.

Here’s a good example of “real computer science” from the late 50s.

John McCarthy wanted to make an “Advice Taker”, a “real AI” interface agent that could deal with humans in commonsense terms and reasoning, and which could both take and give “advice”. He wanted/needed a programming language that would be suitable for programming a “common-sense reasoning system”.

Looking at computers and computer languages of the day (FORTRAN had just happened, and there was also an odd imperative list structured language around), John wound up making a model of computing that was represented in a kind of mathematical notation, and was a kind of math, and which used itself as a means of definition. You could use it to generate results from different parts of the theory.

This definition was not too far from something that could actually be implemented on a computer of the day — and this was done. This created a new artifact — a kind of bridge that could be walked across and tested — and what could be done with it gave rise to phenomena that was enough different from both FORTRAN and its first version, to prompt further modeling, to create stronger theories.

We have two important things to contemplate here. The first is a computer example of a “science of the artificial” and what a model in this science could look like.

The second is a wonderful thing about the last few centuries: that the combination of science and engineering in the sciences of the artificial creates a delightful gorgeous pathway to both great art and great enlightenment. Things are made, they give rise to ideas about them, the ideas give rise to advances in the things that are made, etc.

However, because “anything can have a ‘theory’ put underneath it” we have to judge any science — any set of beliefs, knowledge, theories, etc. — by the thresholds that are crossed — or not — of the models/theories, etc.

I judge what McCarthy did to be really good.

If the area in question is about “intelligence”, I think we have to take this term very seriously in order not to just start BSing.

For example, we can have a lot of theory about “back-propagation” and think it good if the topic is “certain kinds of correlations” but, if the topic is “intelligence” in any reasonable sense of the term, we can’t think the current most popular theories have much content at all.

Strictly speaking, a “Cargo Cult” is a belief system in which the believers think something good will happen if you can just find the right rituals to perform. This behavioral pattern is very human, and can be found everywhere, including the most so-called “advanced” cultures.

I think I can see these behaviors in many places in computing. Can you?

---

cf Herb Simon “The Sciences Of The Artificial”

Is there anyone out there capable of writing a guitar tab from audio alone? I have a riff that I really want to play, but I only have 1 video recording of it and think tab would help.

There is a profession called “take downing” with pros called “take-downers” who listen to music and write out the musical scores of what happened, and what is needed to play it. Most musicians can do this to one extent or another — the pros can do it really well and really quickly. The fees are generally quite reasonable.

Any good guitarist should be able to take the musical score for a guitar piece and invent one or two TAB versions that will help to play it. This is a bit more complicated because where things are played on a guitar often depend partly on the guitarist doing the playing. Sometimes there is just one way that might not be at all obvious — this can take some work to find.

When I was playing guitar, I much preferred “piano notation” (concert pitch two staff notation) to guitar notation (one staff written an octave higher — essentially as a tenor part), and both of these to TAB (which is the least musical and useful of the notations). I think most guitarists who play “developed music” (such as jazz and classical) and who often use scores, will use the standard single staff guitar notation, and annotate some of the critical positions and fingerings for one or both hands. This works better than TAB IMO.

These days my too old left thumb won’t let me play guitar, and I’ve found the lute to be a really wonderful instrument: much more than a substitute — a cornucopia of expression. Most lute music was written in one form of TAB or another and — as above — this is occasionally useful, but most of the time is a pain, even when you get fluent in being able to read it. When learning a piece, I will usually work out the standard notation (fits well on piano staff) for most pieces in order to better grapple with the musicality of the piece. Many modern lute book editions include a parallel piano notation of the old pieces for just this reason.

Just for fun, here is an example of a lute piece from more than 400 years ago by John Dowland. He wrote the bottom TAB part, which has also been transcribed (in modern times) to a concert pitch piano score version placed on top.

image

The TAB part is in “French notation” (one of many in use at the time).

The six lines are the six “courses” (the lutes this was written for have mostly doubled strings for each “course”) and are organized as they are in today’s guitar tabs (so the highest pitched string is at the top).

The tuning of the courses from the bottom to the top is almost like a guitar at the third fret (G C F A D G) — note that the major 3rd is one string lower. The frets are indicated with “a = the open string” (the “a” looks a bit like a short “d”), “b = fret 1”, “c” = fret 2 (looks a bit like an “r”. The “d” looks a bit like a “d” tilted to the left, etc.

The durations of the “note slots” are given by the flags over the staff. A blank duration uses the previous one (note that this does not generally indicate held notes, so you have to deduce as well as can be).

The bar lines have a slightly different meaning (a whole measure is a kind of a single pulse — called a “tactus”). This invites more freedom within each measure (note the larger size of the measures in the TAB).

Should web browsers have stuck to being document viewers?

Actually quite the opposite, if “document” means an imitation of old static text media (and later including pictures, and audio and video recordings).

It was being willing to settle for an overly simple text format and formatting scheme — “for convenience” — that started the web media architecture off in entirely the wrong direction (including the too simple reference scheme c.f. Doug Engelbart and Ted Nelson). Circa early 90s, it had the look and feel of an atavistic hack. I expected that Netscape would fix this rather than just try to dominate what was there (I expected a better architecture both for “thinking about media in the age of computing” and also something not like “an app” but more like an operating system to deal with the actual systems requirements, demands, and scalings of the world-wide Internet-in-progress).

It’s both surprisingly and dismayingly difficult to get people — especially computerists — to criticize the web and the web browsers — even more so perhaps today.

This is despite the glaring fact that the interactive media provided by the web and browsers has always been a hefty and qualitative subset of the media on the very personal computers that run the web browsers.

At the time of the WWW’s inception — in the early 90s — I made several recommendations — especially to Apple where I and my research group had been for a number of years — and generally to the field. These were partially based on the scope and scalings that the Internet was starting to expand into.

Apple’s Hypercard was a terrific and highly successful end-user authoring system whose media was scripted, WYSIWYG, and “symmetric” (in the sense that the “reader” could turn around and “author” in the same high-level terms and forms). It should be the start of — and the guide for — the “User Experience” of encountering and dealing with web content.

The underlying system for a browser should not be that of an “app” but of an Operating System whose job would be to protectively and safely run encapsulated systems (i.e. “real objects”) gotten from the web. It should be the way that web content could be open-ended, and not tied to functional subsets in the browser.

I pointed out that — as with the Macintosh itself — these two recommendations — which seem to be somewhat at odds — have to be reconciled. The first recommendation would be the next stage in the excellent Macintosh “guidelines” about its user experience (Chris Espinosa and others have never been praised highly enough for this important work). These guidelines laid out the conventions to be followed for any app of any functionality — they are the parts that must be similar.

The second recommendation was to reinforce the idea that the content to be run within the system had to be as free from the tools of the OS as absolutely possible (because special needs often require special designs etc). An example, was that the content needed to be able to generate its own graphics if necessary (even if the OS supplied some graphics tools). The more the content wanted to go its own way, the more its presentation to the users had to be made to conform to the standards in (1). As with any decent OS, it has to allow for new ideas while also providing the resources for safety, efficiency, and to manifest user experiences.

If we squint at some of the implications of both of these, we can find a number of good principles from the past. One of them — as a real principle — I trace to the first Unix systems at Bell Labs. The design was partly a reaction against the extremely complex organization of the Multics OS at MIT. One of the great realizations of the early Unix was that the kernel of an OS — and essentially the only part that should be in “supervisor mode” — would only manage time (quanta for interleaved computations) and space (memory allocation and levels) and encapsulation (processes) — everything else should be expressible in the general vanilla processes of the system. More functionality could be supplied by the resources that came along with the OS, but these should easily be replaceable by developer processes when desired.

The original idea was to instigate as much progress as possible without incurring lock-in to a huge OS, but to protect what needed to be protected and ensure a threshold of system integrity and reliability.

Sidebar: perhaps the best early structuring and next stage design of Unix was Locus by Gerry Popek and his researchers at UCLA in the early 80s. Locus allowed live Unix processes to migrate not just from one machine to another on a network, but to a variety of machine types. This was done by combining the safety required for interrupts with multiple code hooks in each process, so an “interrupt” could allow the process to be moved to a different machine and resumed with different (equivalent) code. It was easy to see that combining this with an end-user language would provide a network-wide system that would run compatibly over the entire Internet. Soon after arriving at Apple ca 1984, I tried to get them to buy Locus, but the “powers that be” at the time couldn’t see it.

Note that when such a system is made interactive — e.g. using the sweeping ideas from the ARPA/Parc research community — the end-users need to have a user interface framework that is generically similar as much as possible over all applications — and that this can conflict with the freedoms needed for new ideas and often new functionalities.

So this is an important, large, and difficult design problem.

My complaints about the web and the web browsers have been about how poorly they were thought about and implemented, and how weak are both the functionalities of web content and the means for going forward and fixing as many of the most critical mistakes as possible.

One way to look at where things are today is that the circumstances of the Internet forced the web browsers to be more and more like operating systems, but without the design and the look-aheads that are needed.

There is now a huge range of conventions both internally and externally, and some of them require and do use a dynamic language. However, neither the architecture of this nor the form of the language, or the forms of how one gets to the language, etc. are remotely organized for the end-users. The thresholds are ridiculous when compared to both the needs and the possibilities.

There is now something like a terribly designed OS that is the organizer and provider of “features” for the non-encapsulated web content. This is a disaster of lock-in, and with actually very little bang for the buck.

This was all done after — sometimes considerably after — much better conceptions of what the web experience and powers should be like. It looks like “a hack that grew”, in part because most users and developers were happy with what it did do, and had no idea of what else it should do (and especially the larger destinies of computer media on world-wide networks).

To try to answer the question, let me use “Licklider’s Vision” from the early 60s: “the destiny of computing is to become interactive intellectual amplifiers for all humanity pervasively networked worldwide”.

This doesn’t work if you only try to imitate old media, and especially the difficult to compose and edit properties of old media. You have to include all media that computers can give rise to, and you have to do it in a form that allows both “reading” and “writing” and the “equivalent of literature” for all users.

Examples of how to do some of this existed before the web and the web browser, so what has happened is that a critically weak subset has managed to dominate the imaginations of most people — including computer people — to the point that what is possible and what is needed has for all intents and purposes disappeared.

---

Footnote about “Ever expanding requirements at Parc” (prompted by Phillip Remaker’s comment and question)

When Gary Starkweather invented and got the first laser printer going very quickly, and at astounding speeds (a page per second, 500 pixels per inch), there was a push to get one of these on the networked Altos (for which the Ethernet had been invented). The idea was to use an Alto as a server that could set up and run a laser printer to rapidly print high quality documents.

Several of the best graphics people at Parc created an excellent “printing standard” for how a document was to be sent to the printer. This data structure was parsed at the printer side and followed to set up printing.

But just a few weeks after this, more document requirements surfaced and with them additional printing requirements.

This led to a “sad realization” that sending a data structure to a server is a terrible idea if the degrees of freedom needed on the sending side are large.

And eventually, this led to a “happy realization”, that sending a program to a server is a very good idea if the degrees of freedom needed on the sending side are large.

John Warnock and Martin Newell were experimenting with a simple flexible language that could express arbitrary resolution independent images — called “JAM” (for “John And Martin” — and it was realized that sending JAM programs — i.e. “real objects” to the printer was a much better idea than sending a data structure.

This is because a universal interpreter can both be quite small and also can have more degrees of freedom than any data structure (that is not a program). The program has to be run in a protected address space in the printer computer, but it can be granted access to a bit-buffer, and whatever it does to it can then be printed out “blindly”.

This provides a much better match up between a desktop publishing system (which will want to print on any of the printers available, and shouldn’t have to know about their resolutions and other properties), and a printer (which shouldn’t have to know anything about the app that made the document).

“JAM” eventually became Postscript (but that’s another story).

Key Point: “sending a program, not a data structure” is a very big idea (and also scales really well if some thought is put into just how the program is set up).

What old computers were way ahead of their time?

If the early 80s is “old”, then what are the early 50s or early 60s?

“ca. 1950” Let me submit “Whirlwind”, one of the earliest parallel architectures, and which was designed to be able to do real-time interactions, especially using graphical displays and pointing devices. It looks architecturally like a 16-bit mini-computer — the DEC ones in the 60s came from it — but required many racks of equipment, etc. The key technology that got commercial computing going — core memory — was invented on/for this machine. The SAGE air defense system sprang from this — and the Air Traffic Control system used today came from SAGE. Ivan Sutherland’s Sketchpad was done on an experimental SAGE computer that came from Whirlwind, and was designed by one of the designers of Whirlwind.

“ca. 1960” The Burroughs B5000, the first computer architecture in which almost every feature was designed to run higher level languages, and safely. Today’s “byte-coded” virtual machines are software versions of part of the B5000’s hardware. It had multiple processors, virtual memory, automatic process switching, capability protection, and much more. It was basically a kind of “object-oriented machine”. I once remarked that “7 of the 10 best software ideas first appeared in the hardware design of the B5000” (this is a metaphor … but has the deeper truth that metaphors can sometimes express best).

Whirlwind’s designs established hardware patterns after it, but the B5000 ran up against the inability of most HW (and most SW) people of most eras not understanding anything important about software. We used many of the B5000 ideas in the Xerox Parc microcoded machines, both in “HW” and “SW”, and they were one of the major reasons for the Parc successes.

What books would you recommend for an autodidact who is interested in the field of human-computer interaction?

All learning — in the end — is autodidactic (we have to do the work to learn: no one can give us a transfusion). The main thing that is required for any learning is a sense of threshold — where you have to get beyond in process, knowledge, skill, etc. before you are actually fluent. Next you have to have a sense of sweep and range and depth, to attain a larger sense of the field. This is why most autodidacts who get above all these thresholds in a subject also have gone to school of some kind (this can help a lot, especially if it is not thought of as the center source of all knowledge).

For UI I think there are likely to be more than 100 books that need to be absorbed, and I apologize in advance for not sitting down and trying to put together a list.

Basically, you need to learn about human beings from a number of angles, most of them psychological and anthropological. And this will include wide excursions to books such as “Art and Illusion” by Gombrich, “Visual Thinking” by Arnheim, “The Magic Mirror” (about theater) by Nathan, and much more.

“Portrayals and Visualizations” e.g Ed Tufte’s books, and Bret Victor’s ideas and works.

You need to understand the nature of human communication, and also what it means for a human to get fluent in a new medium (so understanding what McLuhan was driving at is critical — and also some of his predecessors and successors: for example, Innis, Havelock, Ong, Postman, etc.)

Quite a bit of Jerome Bruner will help, as will Hadamard’s “The Psychology of Invention in the Mathematical Field”. These are about “multiple ways of knowing and thinking”. Kahneman’s “Thinking: Fast and Slow” and some of the “Behavioral Economics” work that has come from this need to be included.

I got a lot — in general — from reading the turgid but important Korzybski, especially “Science and Sanity”. And Koestler’s “Act of Creation” (in contrast, a smoothly written book that also requires real concentration).

Learning about “intentions” for UIs is very important. For example, Engelbart’s original 1962 proposal, the Joss “Designer’s View” paper by Shaw, Teitelman’s MS and PhD theses at MIT, Bret Victor again, (and many more).

Learning about classical great UI designs is a must: from JOSS, to Sketchpad, to GRAIL, to Teitelman’s DWIM, to the work at Parc, to Hypercard, etc. I’ve given some presentations about how the current pervasive “Parc GUI” got invented, including many of its precursors, etc. Many of these are available on YouTube.

Brenda Laurel’s book “The Art of Human User Interface”, and the book “Watch What I Do” by Allen Cypher are musts.

“Real AI” (cognitive AI) will be key in UI designs in the future along many important dimensions.

And much much more.

You asked for books, but — as with the theater (which has much to do with UI and vice versa) — it is not possible to learn this without also actual experience in designing and making UIs. It’s the combination of learning how to do hundreds of experiments, plus what has gone before, plus the few important principles that are known, that creates fluency in this subject.

Above all, you have to develop not just a sense of the general user but a great desire — a sympathy — to help them. The first rule in UI design is “The End Users Are Not Like Us!” For example, they don’t do UI design, and most of them are not computer folks, etc. so introspection is one of the worst approaches. (This is also true of theatrics in general.) I have found that including children as users for everything I do has helped greatly — I have more sympathy for children on the one hand, and children are generally a bit more driven to learn than adults — both of these help.

One of the biggest breakthroughs in UI design as it was getting invented was to move from trying to supply tools (and the equivalents of control panels for nuclear reactors for experts), to trying to make “learning environments” for the general public. It’s ultimately like teaching the general public how to read while allowing them to exist and function in society. What are all the dimensions of these two very different kinds of requirements?

This POV has greatly diminished in recent years, and the generally poor UIs around now reflect this (e.g. where are UNDO and HELP on the iPhone and iPad, etc ? ……)

When did cloud computing start?

There are always contenders for “first” — which often cannot be reasonably sorted out — so I like to use “early” instead.

I think of “cloud” as a name eventually given to what was earlier called an “information utility” (as an analogy to the electric power, water, gas, telephone etc utilities that already existed). And I think of this is “information resources” provided from the outside that can be used nearby in a number of useful ways.

The idea originated in the 50s, was heavily influenced by the SAGE Air Defense System (which time-shared huge mainframes devoted to tracking planes, and whose facilities were networked together), with eventually two concrete independent proposals around ca 1959 (John McCarthy and Christopher Strachey). As far as I can tell an “extremely early” working embodiment of the idea in a usefully general form was the Compatible Time Sharing System at MIT ca 1961.

Many other systems followed.

In 1962 JCR Licklider set up the ARPA “Information Processing Techniques Office” (IPTO) and the “ARPA Dream” was to boost human thinking via interactive computing pervasively networked over the world. Lick got quite a bit of funding from ARPA, and early grants included to MIT to really make CTSS practical (the ARPA project there was called “Project MAC”, which had several readings, one of which was “Machine Aided Cognition”).

IPTO expanded to more than 15 projects devoted to various aspects of the dream, and eventually the ARPAnet ca 1969 started to test larger scale networking with the possibility of multiple computers as part of the “information utility”.

The story of ARPA-IPTO (which includes the later creation of Parc as part of it), is well told in the book “The Dream Machine”, by Mitchell Waldrop.

What is the history behind Alan Kay’s quote, “Point of view is worth 80 IQ points.”?

We humans are set up by our nature — and the cultures to which our nature gives rise — to deal with much of our lives and memories in stories and story-like forms. Proverbs are very short stories, and as such they are much more memorable than careful descriptive and expositional paragraphs.

A proverb at its best is actually a commercial for quite a bit more pondering. Many of them — including this one — can include themselves as subjects.

And — as with many commercials in the pop culture — and with proverbs throughout history — most are instead used directly as short stories, and most especially to rationalize an opinion or event.

For example, compare “Where there’s smoke there’s fire” with “You can’t tell a book by its cover”. Most proverbial cultures have a proverb and story for every side of every event because these are used not for “truth” but for resonance, to be dipped into for comfort and a kind of explanation that is the way people want to have something explained. This is how most religious tracts are used.

So they are a double-edged sword at best.

After not understanding Marshall McLuhan ca 1964, I was motivated to spend much of the summer of 1967 to take another deeper pass (a professor I wanted to talk to didn’t like to talk to grad students much, but he was a reader and was known to mention McLuhan … ). When the light finally dawned, I realized that McLuhan was one of a very few who had “made something that was invisible more visible”.

As with many of these momentous uncoverings, the “invisible” was slightly visible but deemed so unimportant that if was effectively not there. This was the idea — and fact — that most of what we take to be “plain reality” are actually just beliefs, and most of these are so taken for granted that we are almost never aware of them. We use them in our reasoning and decisions but think our logic is absolute rather than relative to the — “context”, “perspective”, “point of view”, “world-view”, “paradigm”, etc. — in which we are operating.

McLuhan used this as part of his assertion that the learning of a communications/representation system must require the brain/mind to change (this is what learning means), and that the most important changes are the ones that are absorbed as “context” and rendered effectively invisible. He was initially interested in the qualitative differences in thought brought by writing and then by the printing press. And then could see that television was likely to have an enormous impact on humanity (and without most people being aware of the changes that had to happen).

He was not a scientist, but once so prodded, any scientist and mathematician can see immediately that both maths and science were not just about more of something, but about vast changes in how things are looked at and thought about. Methods themselves could carry new points of view on their shoulders. This was already a topic of pondering — Kuhn etc — but without the pithy secret sauce of great aphorisms.

Arthur Koestler had recently written a terrific book about creativity — “The Act Of Creation” — and that used the explicit idea that most creativity was being able to see an idea, not just as a weak form in an initial context, but as a strong idea in a strong context. He used the imagery of “planes of thought” at different angles.

The ARPA research community in which I was luckily and happily embedded was coming up with new perspectives, POVs, contexts, etc. for computing. The McLuhan expositions made these much easier to see and think about as part of what we should be doing as researchers.

Part of the ARPA research context was drawn from “the implications behind Turing” as to “what computers actually are”. Another important part was drawn from the Cold War (and previous hot wars) as well as from other disasters in the making, such as planet-wide problems with food, water, the climate, etc.

If you try to make the invisible contexts visible, then it is easy to see that some of them are very positive with respect to advancing “civilization”, and others were terribly retrograde. What people “believe is ‘reality’ “ is the most important to put light on, especially if their “reality” doesn’t include the idea that “their reality is mostly just beliefs with no stronger foundations”.

These thoughts persisted in this form until Parc, when I found myself having to explain “stuff” to Xerox executives. One of the earliest ones popped out. I realized somewhat grudgingly that they weren’t “paragraph types”, and tried to find “punch lines” for the talks and pop papers I started to have to come up with.

It’s worth noting here that we need to use “change of context is worth 80 IQ points” on itself. For example: “IQ points” is a metaphor, and is not about what is officially considered “IQ”. And there is no sign: no “plus” or “minus”, even though a moment’s thought on contexts/POVs/etc will show that people can often have disastrously weak contexts for some of their thinking. (Witness history, the last few years, the last few weeks, yesterday, etc.)

Similarly, the earlier “The best way to predict the future is to invent it” doesn’t say anything about what kind of future. People in power often invent terrible futures and make them happen (again, just take a look at unnecessary disasters over history, especially recently).

F = GMm/d^2 works in a context with a flat geometry and instant propagation (Newton was quite sure the latter was not the case in our universe). It definitely worked well, and was soon taken to be absolutely true and was believed as such.

This process is very similar psychologically to Joseph Campbell’s observation that religions were generally founded by social geniuses who tried to find ways — many of them metaphorical — to talk about ideas outside of the general commonsense. Many of their less sophisticated followers reified the abstractions into what they thought were actualities, and believed them rather than thinking about them.

But small parts of science are much more careful about beliefs than the general public (and even within science). Eventually enough careful work revealed tiny discrepancies in Newton’s suggested relationship that required a very different context to give background to another compact (and more accurate) explanation.

A great metaphor is not just a commercial for more thinking, but also can be a door into stronger contexts for thinking. And, it can also be used as an even more opaque blinder (especially if the proverb above is not used to help!).

What was it like programming an IBM 1401 back in the 60’s?

The IBM 1401 was the first computer I programmed as part of a real job (an enlisted man in the US Air Force at Air Training Command (ATG), Randolph AFB, San Antonio, Texas).

In the early 60s, the USAF had a quite difficult aptitude test made for them by IBM to screen potential programmers for any of their computers. I took it for fun in early 1962, passed it, and was assigned to Randolph AFB, and IBM training school for the 1401. The training was done in one intense week, wall to wall.

The range of HW architectures was much wider back then, but the 1401 was still thought to be “odd”, and today would be considered very unusual. It had been developed to gradually subsume “punched card accounting machine” (PCAM) tasks (for which IBM was the world’s largest company). As with many businesses back then, Air Training Command had a number of “shops” of such machines, with perhaps 50 to 100 machines each laid out over 0.5 to an acre.

— I can’t believe that Google won’t yield a picture of one of these large “shops” — perhaps someone can find one —

image

A very small PCAM “shop” — imagine with 50+ machines in an enormous room …

The 1401 computer was a “character machine”, with a memory of 5K to 16K characters of 6 info bits plus a “word mark” bit each — these were set to indicate ends of “words” (a sequence of characters in memory addressed by the low order character. A “data character” was (most generally) an encoding of a 12-row punched card column: 4 bits to encode 0–9 and the other two bits to encode the “zone row bits”. Note that this just gives you 40 combinations, but other tricks were used …

To add two numbers, the 1401 essentially did elementary school arithmetic, progressing character by character until one or both word marks were reached. (This meant that the 1401 could be set up to add one half of memory to the other half.)

The 1401’s memory cycle for one character was 11.5 microseconds … so by computer standards of today it was both tiny and a snail. By PCAM standards, and tape standards, it was quite fast enough.

It was essentially a two address machine, but the address registers persisted, so you could deal with “fields” on “cards” sequentially by e.g. doing one two address add — an opcode and two three digit addresses (which would be terminated by the word marks), and the next add would only require the opcode because it would use where the address registers wound up from the previous operation. (Instructions were processed in the left to right direction with word marks at the opcodes.)

The 1401 was thus essentially a variable sized word and variable sized instruction machine. Getting lots of stuff done was greatly rewarded by careful planning.

IBM 1401 - Wikipedia A nicely detailed article.

The IBM Reference Manual we used (the version here is accidentally from early 1962, just about exactly when I started programming the 1401).

image

The punched cards to be worked on were read by a card reader, converted to tapes, processed, the results went back to tapes, were printed, and/or punched back into card form.

IBM eventually made and sold well over 10,000 of these machines (the first mass produced computer in history).

image

The IBM 1401 “mainframe”.

image

1403 Printer

image

1402 Card reader/punch (good for putting program listings on top of)

image

IBM 729 Tape Drives — with vacuum tape loop buffering

The 1401 we worked on at ATC had 8K bytes of memory and six very fast tape drives for those days. There was no OS, and most debugging was “desk checking”. You could get “3 minutes” once a day to try a program, but you couldn’t touch the machine — you put your listing on top of the card reader and asked the operator to do this and that (making friends with the machine operators was critical).

The basic learning process was to do the one week hardware school, after which you were quite able to program anything on the 1401 (this was one of the fruits of the difficult aptitude test — it probably overscreened, but everyone who did pass it had absolutely no trouble learning a whole tricky machine and its machine code successfully in a week).

Then you spent some months “programming” — which was called “coding”. What was called “programming” back then was design and flowcharting the design. The PCAM machines were all flowcharted, and these were dataflow schemes. These had to be serialized a bit to be converted to computer processing.

A “coder” was thus a human “compiler” — whose job it was to convert the higher level programming language of flowcharts to working machine code.

After a few months of this “on the job training” and assessment one was allowed to do some design, programming, and then coding for new tasks. There were already seasoned (a year or so) programmers — also mostly enlisted men — who were generally very helpful.

There was another larger computer there — the Burroughs 220 — and learning it and doing similar kind of work on it improved the programming and coding for both machines.

A few other things I should mention.

It was possible to be extremely clever with the 1401, and it should be obvious that the wordmark scheme allowed clever memory allocation schemes, overlays, etc.

(It is well worth your time to take a look at Val Schorre’s 1964 “Meta II” system at UCLA, a compiler-compiler that would run in an 8K 1401. I wish I had known about this when I was a 1401er, but its invention happened after I went back to school.)

We were able to make a batch operating system that would run all the shop jobs, and which fit into the top few hundred characters of memory (and using tape to help).

IBM had an extremely capable “tailored macro” assembler — Autocoder — that allowed a wide range of expression and “conditional assembly” to produce highly optimized and compact code.

IBM already had a policy to “make everything run on everything”, and this meant that both the relatively new FORTRAN, and the even newer COBOL had to be compilable and runnable, even on the smaller configurations. The latter never made it into the ATC shop while I was there, but there was a FORTRAN “for curiosity’s sake” (this was because the compiler took about 100 passes (or “pulls” of the tape drive) i.e. a very long time).

Reports Program Generator (RPG) was used for a few jobs (I tried it a few times along with others). It made a few vanilla tasks quicker, but the macro library we had developed was generally more convenient, and only a little more dangerous.

The older larger slower vacuum tube Burroughs 220 did have something really interesting: the BALGOL compiler (Algol 58 with a few Burroughs touches). It had one of the most beautiful manuals, and this got me to read it. One of the officers had been a CalTech grad and had used it there. At ATC, it was also a “curiosity” but as a much “sweeter and cleaner” example of a higher level language.

The next machine in line for ATC was to be a Burroughs B5000, a truly amazing machine whose hardware was made directly for an Algolic language (Algol 58 really), and all of whose software — including the OS (it had one) — was written in higher level form. I learned this machine from the documentation, only understood about half of it, and was back to college before it showed up. (A few years later — by accident — one of my grad school profs was Bob Barton, the main inventor of this marvelous design.)

---

I was able to get back in school for the fall of 1963, and the AF let people out up to 100 days early to not miss a term. I was able to get a good part time job programming for the National Center for Atmospheric Research (NCAR) in Boulder, that would pay my tuition and room and board for the rest of my undergrad years at the U of Colorado.

If one starts with a state of the art object-oriented programming from the late 80s and early 90s (Smalltalk, Self, CLOS), what could have been the next breakthrough in that style of programming, but never happened?

For September 1984, I was invited to write the cover article for the Scientific American special issue on “Computer Software” — back in the heyday when SciAm was a much more serious attempt to bridge the gaps between science and the general public. Within the bounds of this audience, I took a look at some present and future possibilities.

My friend and colleague Doug Lenat was invited to write the article on “Computer Software for Intelligent Systems”, and my friend and colleague Larry Tesler was invited to write the article on “Computer Software and Programming Languages”.

It was extremely tempting to try to convince Scientific American to go to the next levels of explanation to include a much larger look at the possible futures for software and programming. This didn’t pan out.

I included a category in my article — “Ultra High Level Languages” — which suggested that Lenat’s “Eurisko” system was an “almost a new thing” that could bridge to “really new things”.

The tongue in cheek predictive scheme I had come up with for the article predicted the past well, but the next “new thing” didn’t show up. Later I thought this was because the extreme commercialization of personal computing in the 80s and the Internet in the 90s had brought forth a large new ad hoc group of people who wanted in on computing. The weak commercial HW, the new crowd, the “from zero again” contexts, the lack of ARPA-like funding, etc. all conspired to knock expected advances into oblivion.

The research perspective in the 60s and 70s was about qualitative improvements in programming itself, not in extending a particular style (as the current question requests). I still think this is what the discussion should be about, because the real problems are about how to program computers more readily, not about what is currently in place and/or popular.

We have to start with the problems of intent: determining, capturing, debugging, eventually optimizing, etc. We can think of intent as an assertive manifestation of meaning. Whenever I thought about this back in the 60s, 70s, 80s, etc. I would look to see what the “real AI” people were doing (I have to use this term because “AI” has been “colonized” today by what was a small subset of the larger idea back then).

Our thinking about this can be helped when we note that human beings have a very difficult time being crisp about what is sought for, and especially to communicate this to others if they think they have it down. The established fields within Engineering have made great strides over the last 40–50 years by inventing and using CAD tools to incrementally capture and visualize the desires, coupled with SIM tools to vet the designs in real-world situations. There has also been great progress in the third member of the triumvirate: FABrication, and the pragmatic optimizations needed to go beyond what simulations can do.

A supreme irony is that these powerful tools in pretty much all forms of real engineering (there are even CAD<->SIM->FAB tools for “bioengineering”) are done on computers, and most of what they can do can only be done on computers.

But when we look to how “computing” comports itself today, it is hard to see anything remotely at the level of the serious engineering disciplines. In the terms above, a vast percentage of software seems to be done as stabs at FAB only. Some of the exceptions are interesting. For example, at the large scale end of industrial computing, I’ve seen documents of “requirements” in the form of lists, but most with no diagrams at all, even though large parts of the requirements are for UI. There is no attempt to vet the requirements, and no sense that they even make sense in the large.

And so forth. The breakthrough that is needed in programming has to come from a larger sense of what it is supposed to be about i.e. it is about the whole system that is needed to do programming that can be done and trusted in the engineering sense of doing and trusting.

As Doug Engelbart pointed out in 1962, you have to create not just “augmented individuals” but “augmented groups of augmented individuals” in order to be serious about next steps. The augmentations are not just computers and computer tools, but also education, new methods and languages for representation: these have to reflect back into the “augments” that can be manifested on the computer parts of this more complex organization.

What is your review of The Dream Machine: J.C.R. Licklider and the Revolution That Made Computing Personal?

Bottom Line: The best single book about the inventions of personal computing and the Internet, as judged by those who were involved.

It’s the book I always recommend when people ask for accounts of how much of today’s interactive networked technologies and software got invented.

During a discussion of this book, one of the old hands said “The mistakes kind of cancel out” — meaning that the larger sweep of the story is pretty accurate, and was able to capture some of the spirit of the people, the vision, and the work.

The author — Mitchell Waldrop — did more homework with the actual community than any of the others — about 5 years worth — and was able to come up with a workable sequence for the story (this is really difficult, and the other attempts were more confused).

The book starts with the large contexts, including WWII, that got the US government deeply involved in science, engineering and mass industrialization for many technologies, and it properly centers the main story around JCR Licklider, who was a special catalyst, carrier of visions, creator, and funder of the large research community needed for many different kinds of new inventions in computing and networking.

I wrote a tribute to this research community in 2004 — The Power Of The Context — that is both relatively short, and has a pretty good bibliography for people who are interested in a bit more detail on different parts.

A book to read afterwards, especially for stories, is Michael Hiltzik’s “Dealers of Lightning”. This book is not as well organized as “The Dream Machine,” and its style is very much more “The Hero’s Journey” (and this misses some of the most important points). However, Hiltzik was able to get some very good stories in his interviews.

All of the attempts at historical accounts missed capturing one of the most important elements of this community: its ability to cooperate in so many ways. Part of this was due to actual intentions and actions of the funders, and part was due to the particular nature of the grad students and interns — then young PhDs — that ARPA created as part of its “research results”. A very large number of these from the many ARPA projects wound up as friends and colleagues, and being PIs themselves. This was powerful for the culminating phases of personal computing and the Internet, where much of the main work was done by the “second generation” researchers ARPA created.

Why is Machine Learning so hard to explain?

Here is an explanation of “conditioned learning” for complete beginners.

How would one explain AI to a complete beginner?

What would Alan Kay think about a future U.S. President immediately scaling up the circular economy?

The technical issues are many, longstanding, and systems related — and there are many complications from most Americans not understanding them, or the grave dangers the world has been in for several decades now. If this were a conventional war, the enemies have already been allowed to land and establish beachheads.

In the “situation room” of a conventional war — as with other highly complex systems — there would be many kinds of visualizations of what seems to be going on, and from other angles, there would be simulations of the future under many kinds of conditions including the many different kinds of things that need to be addressed and done.

The main thing that needs to be avoided when a complex vital system is under stress, is to topple it. The basic idea is that, for a non-linear system (most of the interesting and important ones are) it generally requires much less energy to topple than to restore it.

image

The “barely stable” configuration here can handle small stresses, but a bit more will topple it, and now orders of magnitude more energy will be require to restore it.

At this point, 57 years after the first scientific warnings of the climate problem, it is essential above all other things to reduce global warming, and most of this will require reducing the greenhouse gases in the atmosphere (not just CO2, but methane, etc.) Methane is especially a problem because there is a lot locked up in the tundra and in the oceans that the current global warming is starting to release.

Even if there were no politics that have to be dealt with, this is now quite a difficult complex problem — not just to do, but especially to understand and minimize the collateral impact of the tradeoffs of the transition. In WWII meat and other goods were rationed, and it is likely that forms of rationing of energy will eventually be required.

A tiny part of the larger problem — with special focus on tradeoffs, and even benefits — is treated in an excellent paper by stellar engineers Saul Griffith and Sam Calisch. https://www.rewiringamerica.org/s/Jobs_White_Paper_Compressed_Release.pdf

An effort of the size needed to deal with the primary crises would have many spinoff benefits and new forms of organization. One of these would be something like a “circular economy”, but one in which the components are mostly also new rather than trying to repatch the current system (which is very far from the aims, the organizations, and the sensibilities needed).

But the first order of business is to head off further global warming as quickly as possible, and this will require not just planning to solve the problem as though people don’t exist, but the larger part of the planning will have to be about how to engineer transitions that will balance the overall threats and solutions in ways that will minimize harm.

This will be the most complex challenge of science and systems thinking ever attempted. The extremely poor response of people and governments to the relatively easy to understand COVID-19 threat, is quite worrisome given that the climate crisis is still much more invisible than the current pandemic, despite the decades of warnings.

The bottle that is the Earth could tip over.

What is the best way for 2 independent programs to communicate with each other? One program reading while other program writes to a file? Or opening a local TCP connection and communicate through it? And which is commonly used method?

This looks like a class homework problem. In any case, it’s the kind of question for which many useful answers are readily available via Google, StackOverflow, etc.

What is declarative programming?

One way to think about this is the distinction between “hows” and “whats” (and perhaps a middle ground of “almost whats”).

Historically, programming started in the “how” realm, with the program as the tactics for making the desired effects from the materials and tools available. This would be equivalent to instructions for building a house step by step, but without blueprints.

Almost always in the realm of practical construction, there is a “what” in mind, and often some work is put into manifesting this before the construction is to begin.

An example of this would be plans for a house, or even a careful scale model, that represent part of a strategy for getting the house made. In both cases the plans are “declarative whats” and humans can often interpret them well enough to organize tactics to build the actual goal. The tactics/hows will often refer and compare to the strategic plan/whats to judge how well things are proceeding.

In programming, we usually start with at least a vague idea that is mostly “whats”. And we have some ideas about how to test the programs so far against these. We usually have less process to vet the “whats” themselves, so it is often the case that the resulting working program will not be what was actually desired (this occasionally shows up in house building as well!).

A perfect collection of tests — where if and only if all the tests succeed then the program is what was desired — could possibly be used as a declarative program. In practice, the tests are not complete, and there are generally no paths to ensuring the adequacy or the completeness of the tests — and most importantly, it is very difficult to just turn tests into a running program (for either humans or other programs).

This could be like a list of requirements for a house, but with no model, or drawings, etc. What is needed is a combination of tools and approaches that allow the conception of the result to be developed in a way that will also allow an automatic program to figure out how to write an efficient program that can find the “hows” from the “what”.

One of the earliest successes was to be able to express numerical mathematics in the “whats” of standard mathematical expressions that showed desired relationships but didn’t show how they could be computed. The “George” system on Whirlwind was one of the first, and FORTRAN followed soon after.

Ivan Sutherland’s Sketchpad system (ca 1962–3) was a startling groundbreaking system along almost every dimension of hardware and software (and I’ve written about it in other Quora answers). For here, it was one of the very first interactive systems to combine design and realization entirely in terms of “whats” — one showed the system “what you wanted” — both in terms of actual pictures and in terms of “constraints” specifying what had to be “true” — and Sketchpad would combine and solve all into an instant solution, which could then be further modified until the desired artifact/model was reached. It was thus not only one of the very first CAD tools, but also integrated SIMulation to vet the Computer Aided Design.

This was monumentally wonderful, and a breathtaking tour de force by a real genius.

Warren Teitelman famously named one of the earliest great attempts to support the tension between the “whats” and the “hows”: DWIM (for “Do What I Mean”, implemented in a series of Lisps, for which he was also the main designer and implementer).

He invented one of the earliest (and deepest) forms of UNDO to deal with many kinds of human errors. He did one of the earliest and best designs to realize something like McCarthy’s “Advice Taker”, and a programmer could do quite a few important things by “Advising” the Lisp system in progress.

It’s worth noting that the “catch” in declarative approaches is that it is very difficult to create a declarative language that will guarantee that anything that can be said in it will have a solution and that there will be enough solutions to make using it to be worthwhile. (Consider that you can say many things in math that are hard or impossible to actually reach, and that for some of them, the system will not be able to determine some of the ones not reachable.)

To this end, one of niftiest things that Don Knuth did when devising TEX was to design the constraint language so that most of what was reasonable to express in it was also within the reach of his solver (a form of “dynamic programming”).

An early example of a purely declarative programming system was the old IBM “Reports Program Generator” (RPG), an attempt to provide a completely declarative system for “producing business summaries”. It worked well enough to survive for quite a few generations, but started losing out to COBOL almost immediately, just because COBOL could “do more”. But this was at the expense of the creation of vast and terrible COBOL programs, many of which are still plaguing businesses (and costing fortunes) to this day.

SQL was an attempt to be “sort-of-declarative” to allow desired sets of elements to be specified more or less “declaratively” and have the system figure out the actual operations to accomplish these. To me this is a parallel to FORTRAN having declarative expressions for arithmetic and comparisons, but having much of the rest of the system deal with “hows” rather than “whats”.

It’s worth noting that a good spreadsheet system is “pretty declarative” and it is fun to see just what can be done in the large with a more general version of the idea (quite a bit actually). I have written about this here and there over the years.

A very interesting “middle ground on the side of declarative programming” was the “recursive functions of conditional expressions” that is pure Lisp. There is some “how” thinking in the planning of the form of the declarative expression, but the results were often quite elegant and could be “read declaratively”. This was also the case for the later Prolog: many of its programs could have a perfect declarative reading, but most of them had to be carefully planned so that Prolog could execute them successfully.

Still, I think there is virtue here. It is very like requiring the writer of an expository essay to do quite a bit of work to have it turn out to be readable and appear simple. And this gets us back to the design and the debugging of the design issues that continue to be so difficult for many in computing to face up to these days.

Another middle ground that hasn’t been explored deeply enough over the last 25 years or so, are attempts to create “runnable meanings” (aka “runnable math in the large”) to be able to nail down desired semantics — the “whats” — with a complete separation from the optimizations — the “hows”. Unit tests don’t really capture this, unless a running system can be synthesized from them. It’s worth looking at the Nile language by Dan Amelang to see a modern version of “runnable math” which has great power of expression, and which was able to allow formerly large systems — such as all of the 2–2.5D rendering of personal computing to be programmed in less than 500 lines of clear and understandable code (its “mathematical entropy”).

Another idea with “runnable math” is that the meanings can represent the CAD<->SIM cycles of deep engineering, to create running models (perhaps requiring supercomputers for real-time requirements), and the FABrication part is done as a parallel organization of methods, heuristics, strategies and tactics, etc that are required to actually deploy the system in a practical way. The basic idea here is that you should be able to turn off any and all optimizations and the only thing that might happen is the system might slow down and use more storage, etc.

Next, we should consider something like the “TurboTax” version of programming. This doesn’t require the programmer to be god-like, because it asks questions and provides formats for answers that it can understand, and then synthesizes a program from what it has gleaned. An extremely impressive example of this a few years ago was a system done by CYCorp for the Marines to automatically write official procurement documents (these are enormous, and have so many requirements that only a few people in the Marines could write them — the backlog was crushing). One of the subtasks in this system was a wonderful system that normalized 20 different data-bases of different kinds and vintages so — for example — terms like “employee” which had different meanings in the data-bases could be rendered into a canonic meaning. This subsystem was presented to the rest of the system as a single data base with a highly semantic way to query it, etc.

In many ways, systems such as the above are likely to appear more and more, and they will likely do a better job of holding their territories than RPG did 60 years ago.

One way to look at a great leap forward would combine a carefully designed language for “whats” with a vast collection of “hows” and “solvers” and have a mediator — some future version of CYC? — that can act as a “semantic organizer” and “semantic operating system” — to find and organize the “hows and solvers” into larger composite solvers for given requirements. (It is quite interesting to reflect that this is just what Sketchpad did as one of its many amazing properties: it had three different solvers, plus code to decide which ones to use and how to retreat to more general and slower ones to eventually get the job done.)

I hope I don’t sound like a complete old fart and fuddy-duddy if I note that the taste for these really hard, semantically rich, problems seems quite lacking in the field today, and especially in the universities where one would hope that the funding, encouragement and freedom to deal with hard problems would still strongly exist.

Should schools be remote learning only until fall 2021?

New section added Jan 29th 2021

---

The answer I gave below, unfortunately resembles statements like: “if everyone would just 100% isolate and wear masks, we would soon end the pandemic”. It misses most of what is actually going on, and what is needed to deal the even larger problems brought by inadequate responses.

For example, under most circumstances, “English as a second language” children need a strong external environment that features English as an everyday means to communicate. And most children are “mathematics as a second language” children, etc. This can be learned in some of the ways I mention below, but in the practical actualities of today, it is not going to happen for most children.

But I still think it is quite wrong at this point to put in danger the adults involved with schooling in physical schools, even if it might be the case that children spread the disease less.

Some children will not have access to computers hooked to the Internet. I think there are possible solutions for them at home that combine printed materials and cell phones — and that using more printed materials is an important idea even if there are computers available.

The key principle between now and the time when it is safe for children to attend physical schools again, is almost certainly “structure” that is supported by media, much of which does not have to be online. But I do think that critical parts of the needed “structure” will require at least cellphones or landlines of some kind. I don’t see how to do this without some parental cooperation (and I think this puts many children in a terrible situation).

---

I shall try to answer what lies behind this question — which in part has to do with how learning can be done, be aided, and most especially shaped for children.

First (with a molecular biology degree lurking in my misspent youth), even with the advent of vaccines, it is very likely that extreme care will need to be taken with the pandemic in the US and the UK until July 2021 or later.

Second, it is very important to realize that the disaster many countries have faced (and will continue to face) was to a large extent avoidable. The virus has to have human hosts to reproduce and a means of getting from human to human. Denying these for about 6 weeks over an entire population will stamp out the threat.

Only a few countries were able to face up to these facts (which anyone who has understood their 8th grade biology course should deeply understand). For example, New Zealand did bite the bullet and face up to a massive solution, and wound up with just 25 deaths (about 5 deaths per million). For comparison the US in just a few weeks will reach 1000 deaths per million — most avoidable — and this is still rising. This factor of 200 in deaths represents a kind of “ratio of ignorance” and lack of thinking skills.

Simple Answer: This almost certainly means that the prognoses of experts such as Dr Fauci — that things will not start getting back to “real normal” until the third quarter next year — are likely to be correct — and that means that children should not be attending school until Fall 2021.

Third If we include recent poor decisions with regard to the climate and with regard to what I’ll call the systems integrity of a country (including politically), we can see that we’ve had a massive failure of public education, which has produced a majority of adults (some of whom are also parents, teachers and politicians) who are currently quite incompetent with respect to the time in which we live.

Fourth, despite that children do most of their learning outside of school — especially at home — and genetically are set up to model their parents, too many of the parents have decided to outsource the main parts of modern education to schools (and are generally happy to have the schools also work as baby sitting processes as well).

Fifth, the outsourcing also has a larger public purpose: the two main reasons for having a universal public education are highly complementary (a) to help create adults with enough knowledge and perspective to be able to be voting citizens in a representative democracy, and (b) to provide children with high level learning experiences regardless of the background or predilections of their parents.

A high quality educational experience for children should even things out, but the current situation is not very high quality, works for some, but many of the differences are made up according to the backgrounds of the parents. Here is a recent study at Northwestern that looks at the pandemic’s effect on public school across different cohorts of the society. https://faculty.wcas.northwestern.edu/~mdo738/research/ADSZ_Covid_1220.pdf

---

A Larger Answer My premise in what follows is that health — which includes “mental health — is the top priority, and that most other needs will eventually be aided more by good health in all areas than bad.

But there is also the rock and the hard place, in that many parents (a) are currently caught up with multiple jobs and are struggling to get by (b) have used the schools to free up their time, and/or (c) are not well prepared themselves in many subjects, and (d) nor are they prepared for actually helping their children learn.

The practical real question now becomes: how can we help every part of this quandary?

I think putting “Children First” is a good place to start from (even though they depend on their parents etc for surviving each day).

When I was a child in the 40s and 50s I was impressed by how my two cousins (contemporaries in age) did “schooling” while part of a family that sailed a large square masted sailing boat around the world for 18 month voyages. This was done through the “Calvert Correspondence School” and worked well enough for one of them to go to MIT and the other one to Amherst College.

The process back then involved reading and writing, with “motivation” and a bit of help supplied by the parents.

The reading and writing had much in common with the “tutor” methods of the elite British universities, where it is all about the learner doing most of the work, with feedback and guidance from the tutor.

Most of the help came from the replies of heavily annotated versions of their own writings from the Calvert teachers/”tutors”. The parents would help interpret these and keep the kids on track. The process was very efficient (as most organized home schooling can be): no more than 2 hours a day is required to deal with much more material much more efficiently that most schools can or do.

In non-pandemic times, those who home school can then spend much of the rest of each day exploring museums etc, reading for fun (if you don’t read for fun you will not develop the chops to read for purpose), clubs & sports, etc. for social contact, etc.

Back to the sailing boat: of interest here is that sometimes an exchange would take more than a month by mail if the boat e.g. were in the South Seas. This did not hinder, and often seemed to help (like Bruner’s “Spiral Curriculum” it provided a way to review ideas and take another tack through the materials).

The question mentions “remote learning” — these days I think meaning “via computer that connects to something” — but did not say more about its form. The key idea in pedagogy is that there is no known way to pour new knowledge and skills in the learner’s ear: the learners have to create these from the processes in the environment they are put in. “Typical school” is far from a good learning environment, and “remote versions of typical school” are likely to be much much worse.

But we should also be able to expect/require parents to provide “guided motivation” for learning frameworks/processes set up by others. The UK has a national curriculum, but as far as I can tell, the current government has not taken upon itself to provide the parents with the materials they need to help their children learn, and whether pandemic or not, remote schooling or not.

The US is set up to have the states to some level have a “state curriculum” that the many local school boards are supposed to follow. As an example here, the state of California — which has about the 6th largest economy in the world if it were a country — has not provided the parents with what is needed for helping their children learn in both normal and abnormal times.

This comes back to the largest question of what to do if the parents aren’t inclined or able to help their children, even through motivation, and it was the public schools that were making a difference for these children.

And this takes us back to how I started this too long piece: the general inability of the public to understand the pandemic and to replace science with “hoping” and “projections of beliefs” has not only resulted in a health disaster, but also disasters in the many ancillary requirements, including children’s education — and much of this because of the failure of the existing education systems to better prepare these adults in the public to make much better decisions in the face of crises. (This is a more than worrisome downward spiral.)

“Many important problems don’t have a half-life”, so “amortizing them into the future” will eventually incur crushing reckonings. Most citizens balk at paying taxes to maintain roads, bridges, schools, and other infrastructure — and their health. But the “interest costs” — what the Mafia would call “the vig” — of the amortizations are huge, and many of these are in the process of coming due.

The “interest costs” of producing ignorant unprepared “so-called-adult”-citizens hit at the direct heart of their society. The last years have revealed some of the dire consequences along a number of important dimensions.

One of many things that needs to be done to start turn-arounds, is for the levels of government that are appropriate, to officially characterize all real schooling as “home schooling” (perhaps aided by schools) and to prepare the materials etc. to help parents really take seriously, and learn, how they can make a qualitative difference, not just for their children, but actually to wind up saving the entire world (do I have to again mention the still almost-invisible-to-most climate crisis whose import dwarfs the pandemic?)

What are some historic examples of elegant, streamlined operating systems?

A fabulous super-efficient elegantly designed HW/OS was “Project Genie”, an ARPA Project at Berkeley in the mid to late 60s. Project Genie - Wikipedia

It is most famous for three big uses of it. The first was that it turned out so successfully that ARPA wanted more, and more or less forced SDS to build a commercial version (the Wikipedia article is sanitized and inaccurate). This was used to form the Tymshare Corp, the first commercial time-sharing company.

The second was that this machine and OS was the basis for the Engelbart NLS system and was the computer that sustained “The Mother of All Demos” in 1968.

The third was that the design of the OS was copied by BBN and used to make the Tenex OS for the DEC PDP-10.

A few notes here. They had a small SDS 930 computer (64K 24bit words, 0.5MIPs) and modified it to add their own home built memory projection and page relocation hardware, etc. Mel Pirtle was the main factor for this. Chuck Thacker (of later Parc and Turing Award fame) was one of the builders of the HW.

The OS was quite ingenious for any day, and was primarily the design of Butler Lampson (also of later Parc and Turing Award fame). Peter Deutsch (of Lisp and later Parc fame), did much of the critical programming.

It had several features that encouraged programmers to let the system know in advance — when they could — their intentions, and in return they would be rewarded by having their program run on the fast queue (and if they violated their predictions, they were punished by being put on the slow queue).

A unique feature (I don’t know of any exceptions) is that the working set given to any process was only about 1/4 the available physical memory. This allowed the OS to bring in the next three processes from the disc/drum secondary storage and thrashing was pretty much eliminated.

Another interesting wrinkle to the design is that Butler “designed the system to crash” — meaning: there were crashes of computers in those days, so instead of pretending that all was well, Butler took the opposite tack and set things up so that a crash could be recovered and resumed in a few seconds or minutes. One of the schemes was to mark pages as dirty or clean and to use a few percents of the scarce resources to preemptively write out dirty pages, so that there was almost always enough clean storage to bring pages in.

There were many other interesting features for inter-process protection and communication, and the particular way the “user machine illusion” was presented to programmers.

What kind of thought process do you need to understand programming?

Even if you are programming in a declarative language, you are organizing a structure of interacting relationships whose realization on a computer will cause desired effects to happen.

The next levels of thinking can feel very different, and to such a point that many “programmers” only choose a few ways to think. For example, classic imperative data-structure-munging programming thought is very different than classic recursive “Lisp-type” programming thought — the first is very direct, and straightforward, and also prone to many kinds of errors — the second is “strategic”, subtle, but also far fewer wrt bugs. “Real OOP” design and programming is rather different from either of these, but most good versions of it resemble the second.

However, for the purpose of this question, I think the simplest dimensions to ponder are (a) the one that stretches from the tactical to the strategic, (b) the one that stretches from “tinkering” to “real engineering design”, and (c) the one that stretches from the ad hoc to the mathematical.

A much more subtle, and ultimately all important, dimension is (d) the one whose rich direction points to “real science”. This is where the kind of thinking can be done that leads to “real inventions” and “really new perspectives” on what programming could mean.

How do computers deal with images, written texts, videos, etc.? Does it transform them into 0 and 1 codes?

Yes. A good way to think of this is that a binary digit — a bit — is not really a “0” or a “1” but just “off” or “on” (or “not there” or “there”). In other words, it is like a simple mark on a wall or paper. So a bit is more intuitively represented by symbols that look more like “not there” and “there”, for example: ☐ and ◼︎

It’s the combination of “there”/”not there” that provides the way to represent ideas. Also see (Analogy) below for an analogy to markings on paper.

So, eight of these together can signify 256 different ideas (there are two ways to pick each of the eight “bits”).

The way to think of this is like a dictionary, where the pattern of eight bits is like a word to be looked up, and something else (often us) has the meanings.

These 256 patterns could be a representation for 256 different numbers, or 256 characters in an alphabet, or 256 colors (or just 256 shades of red), or 256 levels of volume for a sound, etc.

When we want to use a sequence of bits to represent a number, the order of the patterns is chosen to make both comparing (larger or smaller) and arithmetic (+, -, etc) as easy (and as fast) as possible for the hardware.

Just for picture elements (“pixels”), usually at least four of these patterns made from 8 bits are used: one for each primary: for 256 shades each of red green blue, and the last for 256 degrees of transparency.

Visible text is made from a combination of a pattern for the alphabetic character — for example, the letter “a” — and a pattern for the visual representation on your computer screen — a picture made from “picture elements” as described in the previous paragraph (this might require about 1000 of them organized in a rectangle).

Similarly, a sound is made from a sequence of “volumes” (“amplitudes”), typically around 44,000 (or more) for each second.

How many bits need to be used for each amplitude (that approximates the relative height of a wave form at that point in time?

Using 8 bits — so 256 heights (+ or - 128 heights from zero) — can work but will also introduce some noise. 16 bits (+ or - 32,768) can often work, but still can have noise (and there are many tricks for trying to make this work). A stereo channel would contain a sequence of these for each ear. Today, 24 or 32 bits (or more) are used to represent the amplitude for each “sound element” in each channel to get the noise level so low it can’t be heard (sometimes these will be carefully reduced back to 16 bits to save space).

(Analogy) A way to reconcile all this to the world of human senses is to note that all we need is paper and the presence or absence of ink to represent words, pictures, music, etc. What’s on the paper are patterns that are supposed to signify meanings that we can “look up” in our minds. The computer versions of these are quite similar, with the really interesting and added property that the computer can read and write its own markings (to make “a book that can read and write itself”). This is extremely important.

Since programming ultimately comes to controlling computer hardware (even if we don't usually think of it that way), how is it that the same programming languages work on many different type of computers with different hardware?

A good way to start thinking about this is that most computers are “universal”: meaning that given enough memory and time, they can compute anything that can be computed. This means that — for any imagined effect — there exist combinations of each computer’s hardware instructions that can cause that effect.

For example a computer might not have a multiplication instruction, but if it is universal, a combination of the instructions it does have can be contrived to multiply two numbers together,

In the large, this means given enough memory and time, any computer can simulate any other computer “to the bit”.

Most programming languages aim to present a “convenient computer” for humans to program in. The features are in terms of ideas small and large — for example, most of them will provide numbers and a multiplication operation.

The job of the program that translates from the ideas to a particular computer is to make sure that the meaning of each idea is rendered accurately using the low level instructions for each computer (these can look very different, but their effect will be the same).

In computerish terms, the “front-end” of a programming language will look the same from computer to computer, but the “back-end” will be specific to each specific hardware used.

What is Alan Kay's MBTI type?

Astrology posits 12 “types”. But this particular sorting doesn’t work well, so “epicycles” are introduced, and these allow pretty much any observed fact to be rationalized to the system.

Myers-Briggs is more interesting in a number of ways, but also eventually introduced “epicycles” e.g. the blending of the oppositions to allow more explanations to seem to work. Similar kinds of paragraphs to astrology, etc.

All of these are mainly part of the human need for resonant explanations (even if the explanations are just stories and not in accord with close observations) that has been part and parcel of cultural/religious practice in human history.

The “psychometrics” idea has more worth, if it can be used only for what can be done with it so far. (And many might be surprised to know that one of the inventors of psychometrics — besides Carl Jung — was Timothy Leary.)

For example, if you stick strictly to just 4 dimensions and oppositions, you will wind up with 16 categories. These aren’t enough to deal with normal ranges of human variation, so do they provide any help at all, or is this just another astrological scheme in modern dress? Or a different form of tribalism with overgeneralizations that can be dangerous?

One “type” that has been examined in the 20th century (and recently) is “the authoritarian type”, and I think there is something to this. Myers-Briggs devotees might be able to find a way to squeeze this into MBTI, but it seems like a real dimension of human personality — and it fits with William James’ notion in the 19th century of “tender-minded” and “tough-minded” types. I don’t think this quite works with e.g. P/J.

My conclusion is that these ideas — as with many anthropological and genetic ideas — can be useful if packaged with considerable sophistication, but they are likely to be dangerous otherwise.

One technique for “sophistication” is to collect different perspectives on an idea. One of my favorites is the book “Maps of the Mind” (Charles Hampton Turner), with 30 or so different views. Some of these might be more accurate than others, but it is the collection of them all that really helps thinking about the complex processes between our ears (and to some extent in other parts of our bodies).

Along these lines, in a comment below, Robert Mudry gives us a link to another trait analysis. I was vaguely aware of this from decades ago but had forgotten it. If you put it in collection with other schemes, the group of perspectives does help think about and talk about this complex subject: Big Five personality traits - Wikipedia

Since everything can be represented by objects, what is the successor of Smalltalk? What system could be comparably minimal, yet more powerful?

Kernel systems that are written in themselves — like Lisp or Smalltalk — can morph into any new idea and claim that “the potential” was always there …

But just to start readers thinking about this, a number of the “details” of how Smalltalk was implemented could be vastly improved today.

Just to name five of quite a few …

the messaging apparatus could be made to more easily deal with the many kinds of messaging “negotiations” that are needed when things are scaled up considerably from the size of things we dealt with at Parc

The internals of objects and their interrelationships could be much more like (a) the address spaces they were intended to be, and (b) to recursively hold systems of objects in communication (c) to minimize dependencies at every level

a more flexible approach to the forms of messages (and hence to the syntax of the language) could be devised — I would look at “coordination languages” (like Linda) and “relational languages” (like Concurrent Prolog) for inspiration

a more semantic and specific approach to “time” could be installed: for example along the lines of McCarthy’s “fluents”, Reed’s (and Croquet’s) pseudotime, etc. This could also be used to deal with a much larger and comprehensive set of ideas about “doing” and “undoing”, “possible worlds” reasoning, etc. (along the lines of “Worlds” by Warth and Ohshima).

a stronger approach to meaning, ontologies, etc. (a) defining meanings, (b) running meanings, (c) separating optimizations from meanings

And there are more.

Such a “Smalltalk-20” could be gradually made out of one of the existing Smalltalks that has full reflection and something like SLANG in it to bootstrap the kernel (e.g. some Squeak derivative).

What are the five features of the object-oriented paradigm that you consider to be important for good software engineering practice?

“Engineering” is the principled process of creating working systems that are in accord with both our immediate goals and requirements, and also in accord with contextual requirements (such as integrity, safety, predictability, maintainability, etc).

Putting “Software” in front of “Engineering” is a bit misleading, since it is the latter that has to dominate in order for the two terms to mean something in combination. So I like to use EOS (“Engineering Of Software”) instead.

Systems are often described in terms of parts relationships connected by intercommunicated interactions (because most systems have some non-linearities that don’t allow analysis into completely independent parts).

Much of design has to do with finding parts relationships that allow (a) the interactions to be minimal (on the one hand), and (b) to allow the interior complexity of a part (usually also expressed as a system) to also be as minimal as possible.

This suggests some useful tools:

a “part” construct that can protect its inside from its outside, and vice versa

a “communications” construct that can convey interactions and deal with dependencies

a “system” construct that is a combination of parts and communications that can fit in a part recursively, and everything is made this way

the “messages” that are communicated are also in terms of the systems

the system that is being made, is made with the same kinds of system

changes can be made safely to the system without having to rebuild it and while it is running

and so forth

These tools allow some useful abstractions to be devised, including: design of algebras of communication (now called polymorphisms), various approaches to ontologies using analogies, composition, etc.

These also count as tools, but live in a more complex world of human goals, intentions, ways to describe things, etc.

Notice that all these only help design by being able to encompass what designs and systems and descriptions are. They have the same kind of usefulness as language, mathematics, and other kinds of media. They are hugely useful, but the designers still have to understand how to organize the materials to create “art” both with a small “a” and a capital “a”.

The “Oop” that was done in the ARPA/Parc community (e.g. Smalltalk) was just such a collection of tools. We still had to do a lot of design, but it was able to represent our designs without too much added noise, and this allowed the iterative processes of designing and building to be done relatively easily to result in lively small robust evolving systems that were close to what we intended.

Some of the things not carried through as far as they could be had to do with dealing with large scalings (e.g. this can be done better in Smalltalk than we did, but our priorities were to get maximum expression within the tiny computing resources we had in the 70s, and these aspects worked out well with our approach).

How much computer science knowledge has been lost because it was not published or documented, or no working code has been shared?

If truly lost, then we can’t know enough to answer the question in any era.

Given that the vast majority (this is an estimate from personal experience) has never taken the trouble to even type “E n g e l b a r t” into Google and look closely at even the first five hits, I could ask “What does it matter if knowledge is being lost, if the knowledge that hasn’t been lost is being ignored?”

In the US, this is similar to the free public library problem: great knowledge is in there, but you have to step in the door and get to work. This is particularly painful because the source of many of these public libraries was Andrew Carnegie, and part of the funds donated required each library to have at least two special rooms: one for the teaching of reading, and one just for children.

Both of these were because Carnegie was a child laborer sent to the US who had the good fortune that one of his bosses opened his home library on the day off of his laborers so they could make use of the books there. Carnegie educated and grew himself there, and never forgot.

This is worth comparing with the almost blind building on top of the Internet by people who (a) don’t know about Carnegie libraries, and (b) don’t realize that part of what needs to be supplied are the modern equivalents of each of the two special rooms in every Carnegie library.

Andrew Carnegie had context and helped others to acquire it, but the web today — in a memorable phrase — is “the context of no context”.

What is Seed7 programming language about?

With programming languages, quite a bit of assessment should be done by writing various kinds of systems (and especially “reference systems”). I haven’t done that, so I’ve little to say here.

I did find that Seed7 is closely based on the theses of Thomas Mertes, and I do remember (to some extent) these from the 80s.

I think there are three or four “ideas worth contemplating” in the aims of this design, one of them being to try to incorporate a useful kind of syntax extension along with the semantic extensions gotten by various kinds of modules — e.g. subroutines, forms of objects, etc.

Most extensible languages — including the ones we did at Parc — lack “fences”, which should require some form of key to be allowed to use certain kinds of extensions (that can be very useful, but also create both dangers and messes if used willy nilly).

It would be great to see a modern extensible language done in the light of what is actually known today about what is needed.

Does Alan Kay know about Brightmark? The company that is taking recycled waste and making renewable fuel. The company already partnered with BP. If not, I hope this reaches him.

Yes, I am aware of Brightmark, but not in detail.

For the last several years I have been helping The Ellen MacArthur Foundation of the UK, a non-profit that is highly concerned with the climate, waste, circular economy, etc. Circular Economy - UK, USA, Europe, Asia & South America - The Ellen MacArthur Foundation

The world is almost 60 years late addressing these deep issues (first scientific climate warning by NSF was in 1963), and only a fraction of what is needed is being done today.

I’ve been quite interested in playing the lyre, but I have no knowledge on it and since it’s not a very popular instrument, there aren’t guides. My question is, what’s a good amount of strings? I see 7 string, 10 string, 16 string etc., so which?

There are guides on YouTube, and some very good examples of different kinds of lyres. There is a family in Greece that makes historical replicas. Take a look.

Does Alan Kay know or even have met Russell Ackoff on Idealized Design?

Interesting question!

My introduction to systems thinking outside of computing came from a few science fiction books in the 50s, reading a lot of the Cybernetics literature, being a molecular biology major in the early 60s, and then reading quite a few books that were listed in the “The Whole Earth Catalog”, mostly classics e.g. by Bertalanffy, Waddington, and a now classic “Notes On A Synthesis Of Form” by Christopher Alexander.

Within computing, complex systems design was a hot topic in the 60s for operating systems, computer networking, and large complex applications e.g. Sutherland’s “Sketchpad” system.

There were a few emergent languages in the 60s that took a systems and simulation view — for example, Simscript, Simula, SOL, and Sketchpad again.

I think the first systems thinker who was outside of computing or cybernetics that I met was Peter Senge, and he operated in the same general “management guru” space that Russell Ackoff did. However, I did not read any of Ackoff’s books back then or meet him. As far as I can tell, none of his early books were mentioned in the Whole Earth Catalog.

As a result of this question, I got and read three of his essays, and ordered the “Idealized Design” book (and will report on this after looking at it). Ackoff’s essays were good. The early one 1971 for the journal “Management Science” (a very “interesting” term all of its own), is expository, and should have been very useful for its intended audience (I seriously doubt that any but the tiniest fraction read and understood what he was trying to get them to see). The later two essays were written after decades of not being able to get business people to take systems views or learn about systems. They parallel Peter Senge’s experience very closely.

I have a copy of Peter Senge’s excellent book “The Fifth Discipline” — mostly about systems thinking for management — here in London and could not find Ackoff as an entry in the index.

---

OK, I have now gotten and read the “Idealized Design” book, and some other writings (Ackoff wrote many books and papers). I also found that my friend Peter Senge wrote a tribute to Ackoff after he died at age 90 in 2009. And that Peter Drucker also acknowledged Ackoff as a major influence in changing some of his thinking about management.

There is too much to sum up for a Quora sized answer, but here are a few thoughts I had while looking at this material that was new to me.

First, “Systems” should be more of a field than it is: it has a wealth of “phenomena of interest”, and is manifested in many kinds of ways in the world, and at most scalings. As it stands now, “systems thinking” is explored and expressed at more local and restricted levels than it deserves. A particular local milieu is “management studies”, where Ackoff (and Senge) spent much of their careers both studying and attempting to improve. We could add in Jay Forrester and “System Dynamics” as another branch of this within “management”.

Second, the fundamentals of “Systems” have been rediscovered many times, sometimes with great insights and sometimes with less understanding. A real field could really help the foundations of this kind of research and thinking.

Third, much of what’s important about “Systems” and “Systems Thinking” has fallen on deaf ears pretty much everywhere in the wide variety of localized environments that have sprouted discoverers and proselytizers. This includes not just “management” but also to a surprised extent: “computing”. In all areas, even when there is evidence that some people know better, sound principles are violated for reasons of expediency and with the hope that the side effects will not be too damaging (these hopes have generally not been fulfilled).

Basically, most people want to be able to think about things separately, as though there are no connections to other things. And they are all too willing to allow large scale (even planetary scale) death and destruction to be part of the consequences (e.g. the pandemic, the climate, etc.).

Similarly, the “Idealized Design” book is about a principle that is well known in design, but is so often not followed — again for many human reasons — that it was worth writing a book about it (and again aimed at management). An actual Design field — which would be a sibling of an actual Systems field — would have supplied the authors of this book with an even stronger set of examples to try to get management people to understand (for example, the book was apparently published in 2006, but had essentially nothing to say about the interactive, personal, networked computer revolution, even though many of the best inventions were indeed teased out via “Idealized Design” going all the way back to the late 50s and early 60s.

One way to lift all the areas with regard to what they leave out, would be to try to help practitioners to put some of the effort they now put into quick local solutions to simple seeming local problems, into ramifying out possible and likely consequences into larger environments and longer time periods.

Would artificial intelligence be possible if computer programming was not invented?

Here’s a somewhat different perspective on this.

“Artificial” means something made (usually by humans). “Intelligence” is more difficult to pin down — humans like to think that they are, so let’s take an informal view of this.

So “artificial intelligence” could be construed to be “processes using methods, that exhibit ‘intelligence’ “ — especially “great intelligence”.

Now let us look around for such processes, either in existence or as ideas.

I have a decent sized list of these, but let me just put forth Science as one of the most interesting and powerful examples of “artificial intelligence”.

I will be happy to explain this idea in response to comments, but I think it will be more useful and interesting to start off with as a “ponderable”.

And then many of the other things on my list will be easily discoverable. It’s very worthwhile going back in history (and prehistory) to think about the earliest human inventions of “artificial intelligence”.

Have you ever been asked to teach someone how to program computers by a person with no such background and how did you respond?

This hasn’t happened for some years now, but I’ve always tried to help them (we all started “with no such background” …).

The two main parts are “what does it mean to do something at all”, and “what does it mean to make and interact with systems”. *

---

What follows, I first put in a reply to Petar’s comment below, but I think it should be part of this answer:

I have always done my work in the context of a group that is set up to maximize a wide spread of talents and abilities. Just as “science” is “a better scientist than a scientist”, such a group is “a better programmer and systems designer than any individual”.

Really learning to program — etc. — is really learning about how to be part of a scientific/engineering/design team. As with a sport like basketball, there are lots of fundamentals that all need to do extremely well, and there are also “special abilities” that not every member will have, but which the whole team will have — plus the extras of synergy — when the group is functioning well as a team.

How was the first graphical user interface (GUI) made on a computer in the absence of any other graphic software?

I’m writing this because the 4 already existing answers — well meaning perhaps — range from being “pretty wrong” to “very wrong”. They all appear to have been done without any checking of history (and are perhaps mostly “just made up” from scraps that the authors have come by).

To start with, we should at least run down one or two candidates for “the first graphical user interface (GUI)” as the question requests. Both of the “earliest” (aka “very early”) programmable computers, ca 1949 — the Cambridge EDSAC and the Manchester Mark I — could show pictures on CRT tubes, and both pretty quickly implemented games using them — TIC TAC TOE on the former and checkers on the latter. But neither had a pointing device.

So, let’s restrict ourselves to something that both has a display and some kind of pointing device, where something we do with the pointing device affects something on the display via the computer interpreting the inputs and acting accordingly.

One of the earliest pointing devices was the “light gun”, invented by Bob Everett as part of the MIT Whirlwind computer development starting ca 1948, and starting be operational ca 1950.

image

Whirlwind I control room with CRT display ca early 50s

The Whirlwind display was essentially an oscilloscope CRT, and the hardware command was an instruction that would put up a dot at an x-y location.

The “light gun” had a photocell that could see a dot on the screen if it were there, and cause an interrupt. A picture of some kind — such as a character or a line — was made by progressively plotting points. This could be “seen” by the light gun, and by looking at the code after the interrupt, the dot that was seen could be identified, and then the structure the dot was part of could be deduced.

This was used in Whirlwind II — the immense SAGE air defense system computers in the 50s

image

A SAGE console from the mid-50s showing the CRT and light gun pointing device

The experimental SAGE computer also used this scheme: the Lincoln Labs TX-2, which is most famous for being the computer that Ivan Sutherland used to “invent interactive computer graphics” as the large idea we have today via his “Sketchpad” system, which was done during 1962. Sketchpad also had a real “Graphical User Interface”, so it could be claimed to be “the first”.

The TX-2 computer also plotted points, but now had a single instruction — in a 36 bit word — that contained the x-y address where the point should be plotted, and the address to jump to if the light pen saw this particular point. The TX-2 was perhaps the first computer to have many instruction counters that allowed processes to be interleaved without requiring conventional interrupts.

image

Drawing a rivet using a light pen in Ivan Sutherland’s Sketchpad

Ivan was not the first to create a whole graphics system in software from a HW point plotting instruction, but what he did do was so monumental compared to the rudimentary line drawing efforts, that he is universally regarded as the father of not just “interactive graphics”, but also of the “graphical user interface”.

So: as with most things regarding computing, just the tiniest bit of HW/SW connection to I/O plus SW is enough to make a GUI. The key ingredients are insight and design.

Why is it so important to invent the future from the future rather than incrementing from the present?

Incrementing from the present and inventing in the future both have their useful niches.

My research community — ARPA/Parc in the 60s and 70s — was aimed at qualitative changes that were quite distant from increments, for example: the Internet and Ethernet, personal computing with bit-map screens and pointing devices, 2D and 3D graphics and graphical user interfaces, virtual reality, dynamic object oriented programming, laser printing, outline fonts, and many more inventions, each of which created new categories and often new industries.

So ARPA/Parc was interested in the extremes of “invent the future from the future rather than incrementing from the present”.

For these kinds of visions and goals the “present” — and the past that made it — are so vivid that they make it difficult to think about anything that isn’t already there in some form.

The tools in the present are also a big problem. With big new ideas, a lot of work will have to be expended on workarounds, and this will slog down and perhaps kill progress.

If computers are involved, all the above applies, with the addition of possibly being able to do enough optimization to get weak HW and SW to still do something new.

However, once you take off your design hat and put on your optimization hat, it is very difficult to impossible to do further design — it is especially difficult to change your mind while trying to invent (and this is usually necessary).

So, psychologically, we need to somehow eliminate the present as much as possible.

Part of the spirit of the ARPA/Parc community came from some of the previous and enormous WWII invention and engineering efforts. One of the principles was to invent and make everything from scratch if that was called for. This was carried through to the Cold War science and engineering, and for computing, especially the SAGE air defense systems of the 50s. The scale of all of these was titanic.

I wrote a white paper last year for the Ellen MacArthur Foundation in the UK — called “How?” — that was about this, and I’ll put in a link for this later (thanks to Phillip Remaker for finding this link for me: https://internetat50.com/references/Kay_How.pdf).

The culture created by this was that young people in the ARPA/Parc community were immersed in thinking about “What is actually needed?” without worrying overly much about how difficult it might be. And certainly not to worry about existing machinery, vendors, tools, etc.

The result was quite a bit of building scratch HW and SW systems when needed, sometimes taking an existing supercomputer and rewriting all of the SW so it could be used by a single user, sometimes making new mainframes or personal supercomputers, etc. This included new I/O gear such as new kinds of displays, pointing devices, printers, etc.

In order for this to not just be a bottomless pit/black hole, quite a few skills had to be highly developed to pull off all the inventions and tool building in a timely fashion, etc.

A general feeling towards the end of the 50s — partly from the appearance of the first integrated circuits — was that the supercomputers of the present would likely be 100s of times cheaper and smaller in a few years (maybe a lot more). By the middle of the 60s Gordon Moore started publishing analyses of where ICs could possibly go, and this optimism (“a lot lot more!”) fueled the ARPA ideas about “working in the future”.

The basic idea was to maximize and elongate degrees of freedom and to be able to work as though one was already in the future (so that the kinds of optimizations that needed to be done would be as minimal as possible, and not cripple the invention processes).

This was especially critical for ARPA/Parc because it was all about personal computing that was connected so groups of people could collaborate (Engelbart, etc.). This meant that “the future in the present” had to have lots of users of the future systems and these systems had to have sufficient real-time responsiveness.

For example, at Parc we invented, designed, and built almost 2000 “personal supercomputers” in the early 70s that had the computing power of the commodity computers in the middle 80s and later).

Similarly, the ARPAnet and Internet inventions used some of the wires of ATT to carry bits, but invented and built all of the new kinds of packet switching systems and software needed, including the first routers, etc. and put them on top of the ATT wires.

So:

“Go into the future to invent, and then bring it back — and this means you have to make the future to go into where you can invent the future”.

Sounds like a paradox — and it is not generally done today — but take a look at the results …

Has the real computer revolution happened yet?

I’ve answered versions of this question before on Quora.

In my research community — ARPA/Parc in the 60s and 70s — “computer revolution” meant something like the next qualitative leap from the “writing, printing, math and science” revolution. In other words, something on a scale of hundreds to thousands of years.

I think the general sense of “computer revolution” today has to do with the pervasiveness and convenience of technologies that for most people can imitate various forms of old media, and distribute them more widely.

If we try to shape future qualitative leaps that are reasonable analogies to the big deals from and about “writing, printing, math and science”, almost nothing that most people learn and do today matches up.

So, a kind of “computer revolution” has happened — and it is having consequences in the general public — but the one we were sure was possible — and partially invented — hasn’t happened for the public (for whom most of the previous revolution of “writing, printing, math and science” hasn’t happened yet either).

How was version control handled in the early days of PARC?

“In the early days of Parc” would be when there were Altos, Ethernet, Laser Printer service, but not yet a comprehensive set of remote file services. (I’m leaving out here the simulated PDP-10 mainframe — “MAXC” — that we built, which had quite a few versioning features, because these came with the “Tenex” OS.)

Each Alto had one or two “Model 30” disk drives with 2.5 MB disk packs. Some of the systems run at this time did carry out forms of versioning.

For example: “Bravo”, the WYSIWYG word processor that became MS Word in the 80s, did both automatic file versioning and event replays from checkpoints.

Smalltalk kept an automatic history of “changes” to code and could revert. Its virtual memory system also did automatic and invisible checkpointing every 20 seconds or less to guarantee a pristine recent system state in case of any kind of crash.

A later more elaborate — and terrific — fine grain dynamic versioning system was done for Smalltalk — called “PIE” (by Ira Goldstein and Danny Bobrow) — but this was not in “the early days”.

Why isn't meta-programming more common in software development?

There are 4 existing answers as I write this, each of which has something useful to say.

“Meta-programming” at least means “programming the programming system” you are using to add features the programming system doesn’t possess.

Note that vanilla procedures can do this for many kinds of things in a language, so they furnish a kind of semantic extension. A granddaddy HLL — Algol — had two features that allowed this to be taken rather far (a) call by “name” and (b) a way to go beyond vanilla procedural syntax, but having an alternative to the comma.

Call by name sends the unevaled parameter and environment.

The alternative to the “,” was “): <any text>(“.

The first allows a conditional expression to be defined as

if (a, b, c)

where the b and c expressions will be sent unevaled with their environment, and then evaled or not with respect to that environment by the body of the procedure depending on the value of a.

The second allows this to be written alternatively as:

if (a): then (b): else :(c)

Another way to think about all this is that quite a bit of program design these days winds up designing and defining APIs of one kind or another (including the interfaces to “objects”), and any library that will be used also connotes semantic extensions to the language. So it makes perfect sense to allow these to be cleaned up and made more readable via syntactic extension.

Note that this can be done with macros, but the language has to have the quoting mechanisms to include passing environments as well as values (lambdas are one way to do this etc). Macros have the drawback that they are too syntactic (they are expanded as a “soft eval” at compile time), and thus the debugger has to be really capable to allow real-time debugging without confusion (most aren’t).

A better way to do all this is to start with a language that has deep reflective properties, and to consider anything meta as powerful and dangerous (meaning get really good before you start to mess with the meta-properties of your language — this is a different order of design).

Our own experience with this is that “good programmers often make bad choices in the heat of debugging”, and every extension puts a learning curve burden on every other programmer, especially if the extensions are clever and hide their parameterizations.

The flip side is the same principle for tool making in general: you can get into deep death spirals trying to make your own tools, but if you do have the chops, then you most definitely should. These different orders of design just need to be heeded and the necessary care taken.

The third general area in which meta-programming can often help is pragmatics i.e. optimizing. Typical programming styles often intertwine optimizations with the code that also expresses the meaning. This is a terrible idea, but is found everywhere.

Quite a few optimizations are carried out by violating a natural module boundary and trying to get away with it.

One can imagine designs in which the meanings are separated from the optimizations, where the latter can be turned off and the program will still work, etc. A language like Lisp or Smalltalk can actually allow features like these to be added to the language.

A language could even have module mechanisms that confine the violations needed for optimizations. A good language will allow these features to be made via meta-programming.

Another — related — feature that is both semantic and pragmatic is “trial evaluation” — to do this correctly, all side effects have to be restored on an “UNDO” or “backup”. Note that “trial evaluation” can also lead to “histories of versions” at different granularities — this can make a huge difference in both building and maintaining systems.

A really good framework in a language for metaprogramming will have every feature of the language internally as a reflective model already, and will have a late-binding way to get to these, change them, and add to them (for example, languages like Lisp and Smalltalk are built this way, with “turtles all the way down”).

There is also the concept of “gracefulness”. By the early 80s or so, Smalltalk was pretty graceful at all levels of definition. So I was very happy to see about 10 years later — in “The Art of the Metaobject Protocol” (Gregor Kiczales, Jim des Rivieres, and Daniel G. Bobrow) — an even more graceful extension mechanism that went one level deeper to allow really important really deep extensions to be made by using an even more carefully designed initial model.

Good advice from Picasso: “Learn the Rules like an Expert so you can break them like an Artist”.

Meaning: “Just breaking things doesn’t usually produce Art”.

If the two digits of a number are multiplied together, the result is one third of the number itself. What is the number?

15 = 1x5x3

Is it mostly a myth that computing advancements happened in a garage?

Well, HP started in a home garage, but their computer advances didn’t.

Apple famously started in a home garage — here we have to stretch what I’d call a “computer advancement” to cover this. But here is what Woz said about this:

“The garage is a bit of a myth,” Wozniak told Businessweek. “We did no designs there, no breadboarding, no prototyping, no planning of products. We did no manufacturing there.”

Woz did not invent or make the CPU or the memory or circuit boards etc. The nature of the code that was executed, and what it executed in, was devised by others. What he did do with some of the existing component technologies is to make a relatively simple computer that was still versatile enough to be interesting to hobbyists at the time.

The Apple II was a nicer design and Woz did a great and insightful job, especially with regard to the integration of the display, processor, and memory. I would call what he got from the components that were used: “brilliant” and “useful”. So an advance of some kind without question.

I think of “computing advancements” as things like “making the first programmable computers from primary materials”, or inventing computer graphics, or inventing and making high level programming languages, or inventing the Ethernet or Internet, etc.

In going down this long list, I couldn’t find any garages (there might still be one or two), and I found mostly people with advanced degrees of one kind or another (and, yes, there are a few exceptions here also).

In the ~ 60 years that I’ve personally been in computing (starting in 1961), I’m not aware of anything significant from a “garage” (but, again, I don’t claim to be comprehensive here).

Why did 4th Generation programming languages like SuperTalk not catch on?

I would not call SuperTalk a “4th Generation” Language, but a “different generation” language.

The system that SuperTalk operated within was called “SuperCard”. It was essentially derived directly from Hypercard, and tried to dominate that category with many additional and superior features. I thought it was pretty good, and that Apple should take up the challenge (unfortunately Apple couldn’t even value Hypercard). Other systems very influenced by Hypercard’s scripting language Hypertalk were e.g. Flash.

During the first 5 years of Hypercard, Apple was able to count about 4 million end-users — of all kinds — who had made at least one scripted application. It’s worth pondering this statistic.

Neither system was able to deal well with the latter of the double edged sword: “Simple things should be simple; complex things should be possible”.

As an observer, I thought this was a shame, because Smalltalk overlapped very well with the Hypercard scheme, where HC really solved a number of important UI and design issues for end-users, and Smalltalk’s semantics provided the “from simple to complex” spectrum needed.

The sensibilities that were required for what was great about Hypercard’s (and Supercard’s) design, had a very hard time also being able to live in the world that Smalltalk occupied.

And then there was the opportunity completely wasted — squandered — by Apple of not seeing that a Hypercard-like system was the ideal model for authoring on the newly created web in the early 90s. (The web people couldn’t see it either, and haven’t been able to since …)

In any case, my experience over the last almost 60 years now, convinces me that the general audience for “programming languages” does not choose them because they are “good” (like pop culture and the consumer business, it is not generally a “value market”).

What benefits do you get from thinking of something (tech, etc.) as a medium rather than a tool?

The way I thought about this in the 60s was that one way to define a tool is as something that helps control some of the degrees of freedom in a medium while allowing concentration on selected others.

For example, “paper” can be thought of as a medium (lots of degrees of freedom), whereas pencils, pens, brushes, etc. can be thought of as tools. At a higher level, choices of language, style, rhetoric are tools that help writing.

Computers are “meta-media” in that their content is descriptions that are universal enough to define both new media and tools for them. The most interesting ones are those media that can only exist because of being described on a computer.

The commercial/consumer use of computers hardly touches this because the customers generally want familiar old media in a more convenient form (so that almost no new learning needs to happen in order to use it).

This is “the small point” about computing, but misses the larger more important points.

It’s “a shame” that most of the “profession” of computing has also missed being able to see just what it is they are dealing with.

What is the significance of late binding?

“Late-binding” is an idea about having desired functionality while retaining as much flexibility and safety as possible.

A big one to consider is that “software” itself is a late-binding of how a computer will behave. (Seems to be a pretty good idea, I’d say …)

At the next level, the idea of index registers in hardware late-binds addresses in a way that is not destructive of code. Memory management units (MMUs) allow global relocations with encapsulated local addressing. Indirections of various kinds allow easier changes at run-time.

Consider a variable …

Consider a subroutine compared to in-line code …

An interpreter late-binds semantics of a computer.

Consider microcode as a way to later bind the hardware via an interpreter

An encapsulated object late-binds the “hows” (methods) and allows usage just via “whats” (meanings), and this allows various kinds of alternatives and transparencies to be more easily accomplished.

The “hows” themselves can also be late-bound e.g in Lisp or Smalltalk this allows programs to analyze how other parts of the system are set up: to “reflect” on what and how things are done.

(“Reflection” done well, and allow more pragmatic possibilities, even at a very low level, to be carried out safely and efficiently. A good book — mentioned in a comment below — is “The Art of the Meta-Object Protocol” by Kiscales, Bobrow, Rivera …)

From a strategic standpoint, retaining as much late-binding as possible is a very good idea. It is much easier to dynamically remove some of the degrees of freedom if absolutely needed, but really difficult to go the other way round from an early-bound system to more flexibility.

For example, quite a bit of optimization is violating module boundaries and hoping to get away with it. One way to do this in a more principled manner is to really have late bound modules as a model, and then implement the violations as “pragmatic features” of the programming system.

For example, a method in an OOP system could have a “left-hand side” that is just semantics, and an optional “right hand side” of cases with optimizations. The method should run perfectly with the right hand side turned off, but will run faster etc with it turned on.

Similarly, interactions between modules can also be handled in this dual way. (For example, Smalltalk is a “message passing system” but did not manifest actual messages unless absolutely required. Also, Smalltalk had simulations of all of its semantics and could retreat to them if the lower levels on some machines were not complete or had bugs.

It’s worth noting that deep enemies of graceful and useful late-bindings are pernicious — and especially gratuitous — dependencies. These can happen in an early bound system, but there are more possibilities in a late-bound one (so much more design is required to really take advantage of the idea — this will pay off in many ways …).

What needs to be done in order to improve Anki to reach the promise of the Dynabook's "teacher for every learner"?

There’s definitely a way to think of learning as ultimately being able to remember — and every culture has found a lot of things that need to be remembered, are able to get children to eventually remember them, and have some of their behaviors be in accordance with their memories.

But if we look at history, we find large changes in context of both what kinds of things to learn, and what it means to learn them. For example, the invention of writing brought not just a huge extension of oral knowledge, but an even more critical change of context: getting literate is a qualitative change, not just a quantitative one. A large goal of “learning to read and write” is to cross that qualitative threshold.

A change so large that it is hard to think of as an extension of the prevailing thinking patterns in the era of its birth, was the invention of “modern science” less than 500 years ago. It started with the return of accurate map making of all kinds and was catalyzed by the gradual realization that much of “the world was not as it seems” and by being able to make generalizations that could generate some of the maps. One of the most important larger perspectives on this has its 400th anniversary this year: Francis Bacon’s “A new organization for knowledge” (Novum Organum Scientia), in which he points out that we humans have “bad brain/minds” stemming from a number of sources, including our genetics, cultures, languages, and poor teaching. He proposed a “new science” that would be a set of approaches, methods and tools, that would act as heuristics to try to get around the biases and self generated noise from our “bad brains”.

His proposed “new science” is what today we call “science”. Before this, “science” meant “a gathering of knowledge” and “to gather it”. After this, it meant to move from knowledge to context and method and tools — and to new behaviors. This has led to not just a lot of new knowledge, but very different knowledge: qualitatively different knowledge in qualitatively different contexts.

A trap here is that the use of ordinary language for discussing these three contexts — oral, literate, scientific — is that things can be said and heard whether or not the discussants also have these contexts (this was one of Bacon’s four main “bad brain” traits).

E.g. people who can read but have not taken on the scientific world-view can think they understand what science is, and can learn and memorize many sentences “about” science, without actually touching what they actually mean.

Just as interesting, is the difficulty — for those who have gotten literate — of touching what is really going on — especially the feelings — in oral traditional societies. Music and poetry are bridges, but important parts of the innocence and id-ness are hard to get to. “Ecstatic music” can sometimes dominate one’s literate thought — especially when performing it.

To make an analogy here: in our society, there are courses in “music appreciation” that mostly use “sentences” about “sounds”, “relationships, “composers”, etc., in which most testing can be (and is) done via checking “the memory” of these “sentences”.

By contrast in “real deal music”, real music teachers treat their students as “growing musicians” and play with them as a large part of the guidance to help them “get larger”, to “make Technique be the servant of Art, not the master”, etc. It’s primarily an emotive art form …

A nice quote — which has many web pages — is:

“Talking about Music is like Dancing about Architecture”

(attributed to many people from Stravinsky to Frank Zappa). If you do music, you can barely talk about it just a little. The further away from inhabiting music, the less the words can map. (And: note that the quote brilliantly achieves a meta way to do a bit of what it says is difficult …)

The Dynabook idea — “a personal computer for children of all ages” — was primarily about aiding “growth in contexts”* and my initial ideas about it were partly about asking questions such as:

“If we make an analogy to writing/reading/printing-press, what are the qualitatively new kinds of thinking that a personal computer could help to grow?”

I got started along these lines via Seymour Papert’s ideas regarding children, mathematics and computing (my mind was blown forever). I added in ideas from McLuhan, Bruner, Montessori, etc., and … Bacon … to start thinking about how a personal computer for children could help them take on the large world-view of science as “real science learning” (not “science appreciation).

(Via Papert), the dynamic math part of quite a bit of science can be nicely handled by inventing special programming languages for children. But science is not math — math is a way to map ideas about phenomena — so an additional and important part of learning science requires actually touching the world around us in ways that are more elemental than “sentences” — even the “consistent sentences” of maths.

In an ideal world, this would be aided by adults and older children. In the world we live in, most children never get this kind of help from older children, parents, or teachers (this is crazy, but humanity is basically “crazy”).

Another way to look at this is that — as far as science goes — it almost doesn’t matter what part of the world you are born into and grow up in: the chances of getting to touch the real thing are low everywhere.

Several of Montessori’s many deep ideas were key for me.

One is that children learn their world-view not in class but by living in that world. She said the problem was that the calendar said 20th century but their homes were 10th century. So she decided to have her school be the 20th century, to embody it in all the ways she could think of in the environment itself.

Another deep idea is that what is actually important is for children to do their learning by actively thinking and doing — and with verve and deep interests. She cared much more about children concentrating like crazy on something that interested them than about what that thing was. She invented “toys” that were “interesting” and let the children choose those that appealed to them (she wanted them to learn what deep concentration without interruptions was like, and that teachers were there to help and not hinder).

In other words, she wanted to help as many children as possible become much more autodidactic.

(Note that this has much in common with getting to be a deep reader or musician — it doesn’t much matter in the beginning what the titles are, what matters is learning how to stay with something difficult because you want to learn it — if the environment has been well seeded, then all will work out well. More directed choices can and will be done later. And note this is even the case with learning to speak!)

After doing many systems and interfaces over quite a few years (~25) we finally got a system that was like the Montessori toys part of her school (Etoys), and then, in a Montessori/Bruner type of school (the Open Magnet School in LA), we got to see what could be done with children, the right kinds of teachers, and a great environment to play in and with.

What never got done, was to handle the needs of children who don’t have the needed kind of peers, teachers or parents around to help them. This help is not just to answer questions but to provide a kind of “community of motivation” and “culture” that is what human beings need to be human. (The by-chance forms of this tend to be very much reverted to oral society practices because of our genetics — and much of this will be anti-modern, and even anti-civilization. This is a very difficult set of designs to pull off, especially ca. where we are now.)

---

To answer your question: the spirit of Anki is not close to what the Dynabook was all about. It could possibly be a technical aid for some kinds of patterning, but it seems to miss what “contexts” are all about.

---

Here’s another way to think of some of this stuff, and in a “crazier” fashion.

There have been a number of excellent books over the years about the idea that the “invention of prose via writing killed off ‘the gods’ ”. These are worth finding and pondering.*

The two main problems are (a) we need “the gods”; and (b) “the gods” can be very good or bad for us (“they” don’t care).

It’s worth pondering that from the perspective of science, a metaphor is a lie, but from the perspective of “the gods”, a metaphor is true.

The dilemma of our species — and ourselves — is that we have both of these processes in our brain/minds, we need them both, and we need to learn how to allow both to work**.

Learning something really deeply and fluently goes way beyond (and before) conscious thought — important parts of the learning are taken to where “the gods” still lurk.

And, just as you don’t make up reasons for breathing (which “the gods” also handle for you), the reasons for doing these deep things move from “reasoning” to “seasoning” — for life itself.

“Artists are people who can’t not do their Art”.

It doesn’t have to do with talent or opinion … This is a critical perspective for thinking about we humans, and what one of the facets of “identity” could mean … Consider the relationship between the quote above and children …

When you are fluent in music, much of the real-time action is being done “by ‘the gods’ “, whether playing, improvising, composing etc. You are not the same person you were when you were just getting started. Music can get pedantic and over-analyzed, but this can be banished by experiencing some of it that is so overwhelming that it can’t really be analyzed in the midst of the experience (this is not just certain “classical” pieces, but some of “pop” music can really get there as well). This produces the “oceanic feeling” that Romain Rolland asked Freud about.

“Goosebumps are a kind of ‘basic ground’ for ‘humanity’ ”

It’s interesting and important that “the gods” can be found at the grounding of very new contexts such as modern science, and that the two can be made to go together.***

To use this weirder way to look at things:

“Education has to lift us from our genetic prisons, while keeping ‘the gods’ alive and reachable”.

---

For example: Eric Havelock’s “Preface To Plato”, and especially Julian Jaynes’ “The Origin of Consciousness in the Breakdown of the Bicameral Mind” (my vote for the most thought provoking book that is perhaps a bit off).

* See Daniel Kahneman’s “Thinking: Fast and Slow”*, and ponder his “System 1”

** See Hadamard’s “The Psychology of Invention in the Mathematical Field”, and Koestler’s “Act of Creation”*.

How do I lay out my software architecture before I code if the requirements change every day?

Here’s an exemplary process from the past that is a good start on how to think about the present, and even more importantly about the future.

A change in federal laws in the mid-80s rendered much of public utility Brooklyn Union Gas’s database and reporting obsolete. This included more than 1 million customers at that time.

The CEO said “This must never happen again”. The CIO (as was often the case back then) didn’t know much about computing, but had great trust in his head systems designer and programmer Tom Morgan (who did know a lot, and also had a degree in pure math).

Morgan was friends with John Davis of Andersen Consulting, who with his group had been experimenting deeply for several years with object-oriented design (Smalltalk on workstations) — and this because of the farsighted view of Mel Berstein, Andersen Consulting’s head of Technical Services (I was consulting for Mel at this time, so I got to see what transpired first hand).

Morgan and Davis decided to go all out. First, to solve the regulatory problem, but then to include a recasting of most of BUG’s information processing. The requirements were fluid. The target had to be the two IBM 3090 mainframes that were in-house and could not be replaced with anything more suitable.

The larger story can be found in Harvard Business School case studies and in several IEEE papers. Here I’ll just look at the part that helps with the Quora question.

Designing and developing on the IBM mainframe would be a terrible route (the languages available were COBOL, PL/1, and assembly).

So they decided they would first quickly build the entire system as a working prototype in Smalltalk to make sure they had the requirements and possible architectures running and vettable (non-debuggable “requirements” are not requirements). In a large room, they set up all the workstations, and used a version of the Smalltalk system tracer to graphically render the system and print it out on large tiled charts that could be put on the walls to show a large picture of the entire system.

Every night (late!) the system would print out its organization (using an enhanced kind of E-R structure), and the tiled printouts were pasted up on all the walls of the room. In the early morning, the team would “do rounds” (as in a hospital) to see how “the patient” was doing. Questions that needed more detailed answers could be immediately answered by using the live Smalltalk on the workstations.

They would take notes, then have different kinds of meetings to decide what needed to be changed, what needed to be added, etc. The extreme late-binding and “eternal lifetime” of Smalltalk allowed changes to take place in about 1/2 second (you don’t ever have to do a systems rebuild, etc.), and they would spend the day fixing and debugging. This allowed a very neat and sophisticated design to be rapidly evolved, programmed, debugged and vetted.

Meanwhile, the underlying semantics of Smalltalk were programmed for the 3090s (in PL/1 !) and debugged. This was relatively easy because Smalltalk had a working model of its own semantics (called the “interpreter simulator” that served as a runnable model). The compromises taken here were interesting, worked well, and are out of the scope of this answer.

And there was work to make DB/2 be the backing store for the new object-base (I will admit to helping with the design of the intermediate object-cache, which was modeled on the Smalltalk-76 OOZE virtual memory swapping system at PARC of 10+ years earlier).

The architecture that was finally arrived at was essentially a real-time simulation of the company itself (and they could run the simulation ahead in time for forecasting, in the manner of a super-spreadsheet but with graphics).

Why real-time? Because they decided to add radio tracking of their 1500 service vehicles to coordinate needs, especially gas leaks (“from the smell of gas to a fix in less than 15 minutes” was the motto they decided on). The control room for this system showed large screens for Brooklyn, the locations of the fix-it trucks, and many other kinds of info (it was like a NASA or NORAD command center).

The whole process from start to finish took about 18 months, winding up with 6 people to maintain the system, after about 130 were involved in the project at its peak.

The key idea here is that CAD<->SIM done well will allow FAB to be merely a lot of work (but close to the minimum, straightforward, and without a lot of additional bugs).

We could imagine a future system much better than Smalltalk at doing the CAD<->SIM part (for example, qualitatively better semantics for capturing and running requirements would be good, and much more comprehensive real-time displays and visualizations of the whole system on the fly, etc.). We could imagine a more automated route for taking the semantics from CAD<->SIM to FAB. And so forth.

But I think, more than 30 years later, the BUG/AAC system design and process to create it stands up very well compared to anything I’ve seen more recently in large companies.

If a robot can achieve independent evolving thinking like a human being, shall we treat them like a human being instead of a machine?

An interesting dimension and question is “rights and standing”.

Robert Heinlein took up this question in the larger sense in his 1947 short story “Jerry Was A Man” (available online, perhaps not quite legally, at Jerry Was A Man).

Worth reading — as are so many of Heinlein’s writings (especially the earlier ones).

Is software engineering culture based on traditional engineering culture?

My perception is that there are so many flavors of “Software Engineering” today — including many that claim “SE is what we do by definition” — that I can’t answer this question in general. And there is certainly a wide range of approaches that is apparent, from horrendous to “more principled”.

One thing I look for is similarities to some of the strongest approaches in well established engineering disciplines — civil, chemical, aeronautical, electrical, mechanical, bio, nano, etc — and these all include superior CAD tools combined with high definition SIMulation of the CAD models (using supercomputers as needed). Many of them can then take the debugged models directly into semi- or fully-automated FABrication which involves optimization processes, etc..

If we compare with the practices in software, it is hard to find similar levels of CAD<->SIM — remember CAD is about capturing the meaning and intent of the design in a detailed understandable way, and the SIM is to vet the meanings of the CAD. This is primarily semantic. The FAB part is to do all the optimizations of the meanings to allow practical deployable artifacts.

I’ve been trying to find at least one great example of this as of 2020.

What I have been able to find are a number of interesting FAB techniques that are better than they used to be — for example, LLVM and other bottom-level targets and optimizers — but most programmers seem to still program primarily in terms of intertwining meanings and optimizations in the same code (this is not a good approach), and the general design tools seem much too low-level).

And “SIM of meanings” doesn’t seem to exist as an operative idea today (there were some examples in the past, so I’m guessing there are some today … somewhere …)

Part of what I think has happened is that the movement from raw “hows” in the 50s upwards in level of language (towards more “whats”, and capture of meaning and intent) seems to have ground to a halt somewhere in the 80s (maybe into the early 90s?). This really makes it difficult to do a great CAD system (the language forms need to be about meanings and intentions and constraints).

The simplest answer I can come up with here is that most software efforts are not done by people who have studied real engineering — so I think the larger answer is “generally, no”. And most of the recent “software engineering” in large companies I’ve seen would also indicate “no” (but here I don’t feel that I’ve looked at a comprehensive enough survey).

But, looking at the work of some of the people who came up with both the term “software engineering” and the aspirations for it in the 60s — for example, Randell & Russell, Margaret Hamilton, etc. — I can definitely see real — and successful — efforts back then. And that probably means there is a tiny percentage of software makers today who are really doing “software engineering” where “engineering” can be used in a modern sense.

So I’m holding out for at least one exception to this gloomy picture, and I’m dying to be shown it.

What part of Burroughs Systems did you work on? Did you enjoy it? What did you enjoy most, and why?

I wrote quite a few programs for the Burroughs 220 — a vacuum tube core memory computer of the late 50s — while in the US Air Force in the early 60s. This machine was a bit like the MIX machine that Don Knuth used for his “Art Of Programming” books (Don wrote a lot of code for the Burroughs 205 that was earlier but with a very similar architecture).

While in the Air Force, I learned the B5000 in advance of its arrival, but it was a bit late and I went back to college before it arrived. This was a very different architecture from any other computer — aimed at directly supporting higher level languages, multiple processors, virtual memories, emulating “data”, and deep safety, etc. I will confess to only understanding some of this, with the rest being filled in a few years later in grad school.

I had learned enough to want to “mine” Bob Barton, the main inventor of the B5000, who was a prof in grad school (but he was famously reticent about his past achievements).

I did not work on any Burroughs HW after the Air Force (but did learn the 6500, etc without using it).

However, both the details and the idea of the B5000 was used over and over by me and other ARPA (and other) researchers, mostly emulated, especially in microcode when this was possible. It was the basis for how languages were implemented at Parc on the many different microcoded machines we wound up making.

In a talk I once claimed that the B5000 had “7 out of the 10 best ideas in software implemented in its HW”. (Less hyperbolic than it sounds …)

Did ARPA/PARC preemptively try to measure the impact of its research projects?

I’m not sure I understand the question, but as far as I know, the answer is “no”.

There were sometimes “thresholds” that were agreed to in advance by researchers.

For example, at Parc, Butler Lampson urged at the founding of Parc, and we wound up agreeing, that we just had to engineer inventions so they could be used by a minimum of 100 users (meaning for HW, at least 100 personal computers, or running 100 machines on an Ethernet, etc.). SW inventions had to be done enough, supported enough, documented enough, etc. for 100 users. …

There were agreed on thresholds within ARPA: e.g. for how many computers the ARPAnet should be able to support, what kinds of vocabulary etc the joint “task force” on connected speech recognition should support. And of course, really large thresholds for the Internet.

And so forth.

What is it like to be an Apple Fellow?

This was a point in time between 1984 and 1996, so I can’t speak for earlier or later.

The history of Fellow programs goes back to the President of MIT — Jerome Wiesner — wanting to do something for his “odd duck” genius faculty members who were a bit weird for regular funding. His initial case in point was mathematician Norbert Wiener, one of the inventors of Cybernetics, and could define the term “eccentric”.

Wiesner decided to set up a program called “MIT Institute Professors” — not a promotable position, but would be an elected honor, etc. — and this included both salary for life, and a research stipend that was completely discretionary: enough to get most research projects started without requiring normal reviews.

Tom Watson was visiting the large IBM Cambridge Labs and heard of this, and decided this was the perfect antidote for his button-down conservative essentially marketing driven company. He set up the IBM Fellows program (which was greatly expanded by the later Chief Scientist of IBM, the very able Lew Branscomb). This was the same scheme as the MIT Institute Professors, plus one interesting power: any IBM Fellow could walk into the president of IBM at any time without making an appointment if there was a felt urgency. There were many famous IBM Fellows including: John Backus, Benoit Mandelbrot, etc.

Other companies followed suit, including Xerox, which had about 16 or so while I was there.

Butler Lampson and I both wound up as Xerox Fellows.

When I went to Apple — as a Fellow — there was already a Fellows program, but they asked me to lay out the charter to be similar to IBM and Xerox.

So being an Apple Fellow was as described above. The small discretionary stipend for getting things going helped many things at Apple. Besides my own work, Bill Atkinson had the resources to prototype Hypercard as he wished, and this was one of the most significant end-user systems of the 80s. The basic idea is that you would have to compete with everyone else for the larger funding needed for next stages — but you could compete with a working prototype at least. Eventually Apple Marketing got so strong that it brushed aside anything it didn’t think it could sell easily, and this started a long slide downwards for Apple.

After this I took my research group to Disney where I was a Disney Fellow, and then eventually a half and half arrangement with HP (ditto) until Carly got fired and about 1/3 of HP Labs was laid off. (But these are other stories …)

Bottom line is that Fellows programs are very good ideas if the companies can adhere to Wiesner’s original design. The type of person that got chosen to be a Fellow in the old days was brilliant in important ways, very independent, already hard at work, usually underfunded, and not a good politician. The Fellow design helped all of these considerably.

How might drawing input work on a Dynabook?

Here’s the original cartoon and cardboard model from 1968.

image

You can see that one of the features is a stylus (modeled after the original RAND tablet stylus, and also the stylus and tablet we did for the Flex Machine).

In the 1972 paper I wrote for the ACM National Convention, I also mentioned that the whole face of the Dynabook could be a touch-sensitive display (this has tradeoffs as an idea, as people have discovered with iPads — it is not really suitable for lots of text entry).

Experience with the really good character recognizer software on the RAND tablet for the GRAIL system convinced us that even a perfect character recognizer (it was close) wouldn’t cover text entry — so there needed to be both a stylus and a keyboard).

It is not at all a surprise that this is still true, and I complained to Steve about it when the iPad first appeared (he didn’t draw, etc. and didn’t type much …). Why it took them 5 years is one of those pop culture and marketing mysteries …

It’s worth re-mentioning that by 1968, there were two interactive systems with pointing devices and hyperlinking: Engelbart’s NLS and GRAIL. Both had deep virtues. GRAIL felt a lot better. And, besides the plasma panel flat screen model (with power hungry pixels) there were also the start of liquid crystal flat panels from George Heilmeier at RCA and then Peter Brodie at Westinghouse that required very little power:

image

George Heilmeier at RCA in the 70s with a larger LCD panel.

And, besides the ARPAnet, ARPA was also in the process of including wireless networking.

And, the Flex Machine was about the right size of computer for first waves of VLSI integrated circuits that could go on the back of a display.

So you can see that I didn’t have to do a lot of imagining to come up with the Dynabook idea … once I saw what Papert was doing with “children and powerful ideas”, it was easy.

What are some software architecture mistakes where things were done at the wrong layer or level of abstraction?

If you knew there was going to be an Internet with many different kinds of computers, “operating systems” etc., and you realize that software “wants to run everywhere”, could you then imagine that traditional theories of “operating systems” and “programming languages” etc are quite at odds with a good future from the standpoints of both architecture and abstraction?

If you can, then you can see that “almost everything” today is way off …

Was the Plato IV system any influence on the Alto and PARC in general?

I thought I had answered this a few years ago, but can’t find it.

In 1968, ARPA convened its first grad students conference at the U of Illinois, with two students selected from each ARPA project. John Warnock and I were the two from the Utah project. John talked about the first practical algorithm for continuous tone 3D graphics (which he had invented), and I talked about a desktop personal computer with OO everything and a multiple windows UI (non-overlapping) — the Flex Machine — that Ed Cheadle and I had invented.

image

Self portrait of the Flex Machine ca 1968 on its nice HP calligraphic display

While there we got a tour of the U of Illinois and saw one of the first working flat screen displays — using a neon “plasma”.

image

This was being developed for the Plato IV system terminals at the Plato project there.

These consumed a lot of power per pixel, and on the eventual 512x512 panels of a few years later, you couldn’t light all the pixels at once (so you had to display light on dark, which is not the best in regular lighting). When Parc started up a few years later we got one of these displays (without the terminal) and hooked it to a DG Nova minicomputer to try things.

However, the pixels were bistable so you didn’t have to refresh from a memory, and the actual main reason for all this was that Don Bitzer who headed the Plato project wanted to rear project from a slide carousel that would be on top of the terminal (so high res pictures could be incorporated in the Plato lessons).

image

A Plato IV terminal sometime in the 70s (you can’t see the carousel on top). She can point with her finger because there was a cross grid of infrared lights and sensors to provide (very) low resolution pointing.

So the terminals were as large as if they had CRTs, but the memory and the projection possibilities made the risk of making a new kind of display worth it.

We liked the Plato people a lot. Like the ARPA community and Xerox Parc they just invented and built everything they wanted that they couldn’t buy.

But they and Parc were on completely divergent paths. Plato ran on a 1000 terminal time-sharing system, and the displays were slow. So what they went after was very different (some of it was quite good, and some of it inspired us to do better with the vastly more powerful/person Alto).

Back to 1968. We were aware that flat screen displays were coming, but it was very exciting and inspiring to actually see a working one. This led to discussions about when the transistors in the Flex Machine could be put on the back of a flat screen display to make a tablet personal computer (the answer was in about 10 years we thought).

I didn’t think much more about this until a few months later (still in 1968) I visited Seymour Papert and saw his wonderful work with children. It was a life’s path changer. On the plane back to Utah I drew this cartoon:

image

Suddenly, this seemed super important, because kids should be able to have fun learning and computing wherever they were. They needed “Dynabooks”!

This became a deep goal, and it was one of the motivations a few years later at Parc for the Xerox Alto (whose first name was “The Interim Dynabook”).

(The Plato people thought we were literally crazy to try this …)

What is the best way to learn Lisp in 2020?

Much of learning language today is “learning libraries”, but what’s really important about Lisp is its approach to computation, representing computations, and meta-reflection.

(Lisp is just the tiniest of universal kernels plus a “library” — so the learning of Lisp is learning its kernel in detail.)

So learning Lisp by building it from the completely bottom-up is advised. I like the original McCarthy treatment of just starting out with the idea of a pair and a few primitive operations.

Peter Norvig has an excellent web page on how to make the Lisp primitives themselves in just a few lines of code from a more standard language. This is really good if you are already a programmer. It has a very nice feel to it.(How to Write a (Lisp) Interpreter (in Python))

Peter also has a nice set of links to other references at the bottom of this web-page, and also a link to a more elaborate implementation.

However, going the opposite direction is really good also. For example, if you implement an “FEXPR” version of Lisp (it doesn’t eval its arguments before a call — the callee gets to decide) then you can get rid of all the “special forms” — like quote, lambda, cond, etc — which are now just vanilla parts of the language. You can also look at some of the lazy evaluation schemes (that are today part and parcel of many “functional languages”).

The Abelson/Sussman classic “Structure and Interpretation of Computer Programs”, starts from the few primitives and builds structures for most common types of programming. A really important idea here is for the student to try some others that are not mentioned (such as “pattern directed” and “publish and subscribe” and “friendly syntax”, etc. extensions).

I would most certainly urge my students to implement McCarthy’s “fluents” ideas, which allow both states progressing in time and completely functional actions on them.

Many people who take the trouble to work their way through all this will have an enormous whammy along a number of important dimensions. (One will be to realize that the “normal” approach to programming and computing is disastrously meager in its outlook.)

Can we use object-oriented methodology for designing heterogeneous complex systems systems?

Here is my answer to what is essentially the same question: Software systems are becoming more complex & highly heterogeneous therefor they require better modeling approaches &implementation using multiple programming languages & technologies.can we use OO methodology for designing such heterogeneous systems?

I read somewhere that object-oriented programming was coined by Alan Kay circa 1966 or 1967 while he was at grad school. Has anyone influenced/contributed Alan to coin such a term?

Since the question has been asked, it’s worth reading the detailed history I was asked to write by the ACM in 1992, that became one of the sections of the 2nd History Of Programming Languages conference. The Early History Of Smalltalk

In brief for here, I saw parts of the idea in various forms starting in the early 60s, and thought it useful, but stayed asleep until in 1966 I saw Ivan Sutherland’s Sketchpad system (which completely changed the ways I looked at computing), and within a week saw and learned the first Simula, which was less grand than Sketchpad, but showed how ordinary programming could be changed to take advantage of instantiations of processes.

This double whammy combination “rotated” me to see things from very different perspectives.

A key part of the “rotation” was that

(1) at that time multi-processing and time-sharing systems were using hardware modified to isolate separate processes in the form of “virtual versions of the hardware”

(2) ARPA was in the process of talking about doing the ARPAnet, that would allow many computers to intercommunicate

(3) my two main concentrations in college had been pure math and molecular biology

The form of the “rotation” was ridiculously simple. It was the simple realization that a computer could compute what any computer could compute, and thus you could represent anything computable at any scale using only intercommunicating computers (most would be virtual) as building blocks.

This was completely impractical (which I think was one of the reasons I didn’t think of it earlier). The molecular biology and the ARPAnet really helped, because it was known in the mid-60s roughly that each cell in our body contained billions of informationally interacting components, and we had 10 to 100 trillion cells in each of us. That kind of scaling actually worked, and was far beyond what computing could do.

I think that seeing Sketchpad shocked me into being able to use “pure math mode” as part of the thinking rather than just the “worry about efficiency” thinking I was used to doing when computing. If you allowed “infinitely fast and large” computing, then the idea made excellent sense: it was a universal building block for all scales, and what remained were the central problems of designing complex systems.

The nature of the intercommunications would allow schemes that were like algebras in pure math to be devised so that terms — like “+” or “sort” or “display” could have both general and specific meanings.

The huge potential got me to look at the “impractical” part, which looked much more doable than I’d thought (it still took about 5+ years and a great research group to do). LISP had already solved a number of the problems, and this proved to be a great set of ideas for context.

In the 1960s, software composites that were more complex than arrays, were often called “objects”, and all the schemes I had seen involved structures that included attached procedures. A month or so after the “rotation” someone asked me what I was doing, and I foolishly said “object-oriented programming”.

The foolish part is that “object” is a very bad word for what I had in mind — it is too inert and feels too much like “data”. Simula called its instances “processes” and that is better.

“Process-oriented programming” would have been much better, don’t you think?

In any case, I did not at all have “Abstract Data Types” in mind as a worthwhile goal, even though they were obvious — and this is because “Data” as an idea does not scale at all well.

You are much better off hiding how state is handled inside a “process”, only having processes, and treating processes as “servers” for each other.

That is what I had in mind back then.

Software systems are becoming more complex & highly heterogeneous therefor they require better modeling approaches &implementation using multiple programming languages & technologies.can we use OO methodology for designing such heterogeneous systems?

I’ll rewrite this question:

What we want to do with software is generally more complex than the combination of our tools and understanding can handle reasonably: what should we do?

We need to think about the best forms for decent solutions, and the best ways to get to the best forms.

A form I like for software in general at any scale — and especially as scales get larger (e.g. approaching the Internet in size) — is that of a system of safe modules that can intercommunicate to get synergies and which can be modified and added to without having to stop the system.

(Note that this is also a description of the Internet itself.)

I also like to have the interiors of the modules be in the same system form (and turtles all the way down). However, the safeness of the modules (meaning they can’t hurt other modules or be hurt by them) allows all manner of interior forms, including really bad ones.

“Communication” between entities means there exists a shared context — no matter how small — in which both entities have similar enough theories. It is easy to see that descriptions* in a “small language” will scale better than hoping for large comprehensive vocabularies in a vast library.

There is no reason not to have communications both within physical computers and to modules within other physical computers.

Part of “safeness” means that communications are no more than questions/requests (it would be a disaster to allow imperative actions by one module on another)— a main point about modules is to be able to do things whilst minimizing what any module needs to understand about another.

Communicating in terms of descriptions when needed will require the intercommunication scheme to be able to express meanings and relationships and intentions in a richer way than most existing computer languages.

A generalization of “publish & subscribe” is needed — e.g. “gifts” and “wishes” — where the “advertising” for both is in the “little semantic language” that the communications-brokerage can find useful matchups.

And this semantics has to extend if we want to (or need to) share a module or give it to a another module. There might be cases in which a module will have to be unpacked from its context to the context of the unpacker: this can only be done if communication can be reached on what this means. **

One of the bugs in current day systems that are called “OO” is that they are too often used as “Abstract Data Type” languages, and this form of programming doesn’t scale at all well, nor does it handle the needed semantics well. And most of them lack a deep enough reflective system to be able to gracefully convert them in all the ways that are needed. And there are many other problems …

My inclination would be just to make a new language (in the same way that the Smalltalks were made at Xerox Parc) to handle today’s scales and needs.

---

Licklider wrote his first “Intergalactic Network” memo in 1963 when he couldn’t understand why when programming in Jovial on a Q32 in Santa Monica he couldn’t use a library function written in FORTRAN for a 7090 at MIT, given that the two machines were connected by a telephone link. (The “Intergalactic Network” is now called “The Internet”.)

This led to the question of how you could find a library routine if you didn’t know its name. One of the first languages to solve part of this problem was Smalltalk.

** Note that CORBA and many other half-hearted attempts declined to actually try to solve these problems (but they are indeed solvable).

How important was William English to the evolution of the personal computer?

Bill English was one of my oldest friends and colleagues, and the simple answer is “enormously important”.

Doug Engelbart was a great visionary — cosmic in many ways — and Bill — as the codirector of the “Augmenting Human Intellect” project and the chief engineer — was the person who both made things happen, and also found and mentored the highly talented NLS researchers, especially including Jeff Rulifson who was the main software architect and designer.

I like the ACM’s “Software Systems Award” better than the Turing Award because it can be given to as many people who have been central to a particular software invention and creation. The 1990 ACM SSA was given to Doug, Bill, and Jeff for NLS.

The famed “Mother Of All Demos” (see The Mother of All Demos - Wikipedia) was designed and created by both Bill and Jeff.

Bill had theatrical experience, understood the magnitude needed for the “show”, designed it, made it happen. specified most of the engineering underneath it, and was both the rehearsal director and the live director.

Jeff was the main software arbiter and one of the central designers and implementers for what was attempted and shown. The scope of this was quite amazing (and beyond a Quora sized answer).

Bill was also one of the best human beings I’ve ever had the pleasure to be around. If he saw that he could help, he would help.

He helped me in many ways at Parc, especially in the early days when, quite fresh from grad school, I was trying to understand how to set up a research group.

We loved him then, and we love him today.

How do I make my computer more interactive?

This question sounds like a homework assignment. If so, you could learn quite a bit by trying to answer it by poking around yourself. And, ditto, if it isn’t a homework assignment.

Why do good ideas often not scale?

This was a favorite utterance of the great systems (both hardware and software) designer Bob Barton. His exact quote was:

“Good ideas don’t often scale”

It was both a cautionary heuristic and a practical observation.

For example, anyone can make a doghouse from almost anything, but trying to make one that is 100 times larger will generally fail: the strength of many materials goes as the cross-section — so the scaled up doghouse will be 10,000 times stronger — but the mass goes by the volume, so it will weigh 1,000,000 times more. Scalewise, it has gotten 100 times weaker and will collapse (and so will a scaled up dog, for the same reasons).

In simple programming, we can look at e.g. the possibilities for pair-wise pernicious interactions between components (could be just lines of code). There are N ways to pick one of these, and N-1 ways to pick the second one — this means that scaling simple programming is likely to give rise to “bad things happening”.

This simple observation is one of many good arguments for devising module schemes that are completely encapsulated.

If you look at the design of the B5000 hardware and software (1962), the amount and degree of encapsulation at every level is breathtaking — not just by the standards back then, but today. And, the B5000 was the first by many years to have its operating system completely written in a higher level language (a form of Algol extended for systems programming). This was another way to reduce pernicious interactions. The results were astonishing wrt crashproof integrity.

This did not prevent other vendors like IBM from scaling bad ideas (IBM was a marketing company and, if they could sell something successfully to their unsophisticated customers, that was what they measured themselves by).

Another of the many ways to ponder this principle is to notice that human capacity for simultaneous ideas is very limited (for simple things: 7±2, and for most things, more like 4±3). A situation that can be understood inside this limit can quickly get out of hand when it is exceeded.

Yet another way is to look at other human limitations that are called “cognitive biases” and find the ones that are prone to disaster when what is being thought about is scaled.

For example, we have a very hard time to deal with growth that is not linear with our “commonsense” alone. If we know that, we can use tools — like arithmetic — to compensate by calculation. But note how easy it is for most people to be taken by surprise by an easily calculable exponential growth curve, such as COVID-19.

Another one is “confirmation bias”. It can take an idea that works in limited form and transform it into “the greatest idea of all time”. We see this in many areas of our culture (and unfortunately in our governments).

Back to this as a mantra: it is one of the most helpful, and worth using. I would call “good ideas don’t often scale”, “a great observation that does scale”!

In a talk Alan Kay gave, he said there are several reasons (6 or 7 reasons) why America no longer hires and makes a good use of its smart people and scientists like it used to in WW II and in The Cold War, can he mention them?

I don’t think things are as total as the question implies. But — with regard to funding for computing as an example — I don’t think there is any question that the government funding for “edge” computer science is very far from the same quality and kind as it was in the 50s and especially the 60s.

The fruits of the funding that was done back then gave rise to a number of very large world wide industries, new technologies, personal computing of all kinds, pervasive networking of all kinds, and created many 10s of Trillions of dollars of new wealth.

The return on investment was enormous, and one would think that the government would want to foster more “edge” research of this kind for both the uplifting of humanity and the new kinds of commerce it gives rise to.

I don’t know in a scientific way why this isn’t being done today, but we can make some guesses that won’t be too far off.

First, it’s worth noting that “people in general” know little about science and technical fields, they are not interested in science, and in fact they distrust technical people to a considerable degree. This is stuff that is not normal to their culture, and what is not normal to a culture seems “crazy” and “dangerous”.

So it is usually only when they are deathly afraid of a perceived danger that they are willing to call on technical people for help, and especially to fund them. War and the threat of war have been the biggest stimulants to “unusual” research, methods and people.

(An interesting example today is the difficulty “people in general” have in getting enough afraid of COVID to even protect themselves and their own families, let alone listen to people who are specialists in epidemics explain what is going on and the penalties for not acting. Similarly, and even more disastrously, they can’t get enough afraid of the climate problem to politically back what needs to be done.)

For the rest of this question, I’ll point to a white paper I wrote last year for a foundation in the UK about how maximum efforts have come about, and how they have been organized. There is also a part of this that talks about why there is such resistance. Some of the reasons discussed there include:

Myopia in time and environment

Amnesia (quickly forgetting past difficulties)

Optimism (“things will work out”)

Inertia (especially where there is uncertainty)

Simplification (cognitive load, etc.)

Herding (basing decisions on societal consensus)

Other (aversions to outside the tribe)

Loss Aversion

Dunning-Kruger

White paper: https://internetat50.com/references/Kay_How.pdf

All the references from that talk: Alan Kay - References

What was the very first computer science fact you learned that blew your mind?

My undergrad majors (in the early 60s) were pure math and molecular biology, and I worked my way through college as a “data systems” programmer for the National Center for Atmospheric Research in Boulder, Colo (basically writing sorting and retrieval routines for the mountains of weather data that were used for analysis, and for one stretch, writing systems code for the CDC 6600, including several simulators of it and for it).

I knew I didn’t know anything about computing, but like most programmers I could still write and debug code to do this and that.

On the other hand, from the standpoint of pure mathematics, the computer was pretty simple and straightforward — and finite — the tricky parts came from the huge degrees of freedom over time, and sheer size of assertions and implications vis a vis our weak human brains that had to handle these compared to the way regular maths were approached as small usually infinite generalizations.

The notions of “artificial intelligence” also were unmysterious philosophically to those deep into where molecular biology had gotten to ca 1965 or so. Again — as with life itself — what was tricky was the amount and kinds of organization required to get simple building blocks to “think”. This seemed like an extremely difficult, but eventually solvable problem.

Simulating one computer with another — a la Turing — also seemed straightforward, and I had to do this several times while at NCAR.

After graduating, I was burnt out and decided to take a year off to learn a bit more about computing instead of plunging into grad studies in my majors (or to get a regular job).

I quite accidentally wound up at the U of Utah grad school in computing in late 1966, and in my first meeting with the head of the department — Dave Evans — was handed a copy of Ivan Sutherland’s MIT PhD thesis “Sketchpad, a man-machine communications system” from just a few years earlier in 1963.

This was the first time my mind was completely blown by something in computing.

This is because every part of it took something where I understood the category — such as a computer can make pictures, and a computer can simulate another computer, you can program a computer, etc. — and provided a completely startling example that I had not dreamed of.

image

image

I don’t know how to sort out the impact from what it was specifically (as the first great invention of interactive computing/declarative programming/object oriented design/meta bootstrapping of tools, etc) from what it was philosophically: a demonstration that “simulating a computer on another computer” could be qualitatively different and superior in kind.

Both of these hit me/bit me hard, and for life. The first was “it’s the kind of thing we must invent for everyone”. The second was “it’s a new artistic medium of vast scope that cosmically transcends its materials”.

Has the fixed architecture of microprocessors turned out to be a wrong turning, given that many functions can be more easily performed in a programmable gate array which are rather more flexible?

This question preaches to this particular choir, because of the great success at Xerox Parc in the 70s of the various microcoded personal computers done there. And before this, microcode (invented by Maurice Wilkes of EDSAC fame) had also been used in significant ways on significant computers.

Part of the reason for the success of microcoding had to do with the large differentials in speed between CPU logic and most memory cycle times (this was often 5–6 times in favor of CPU logic, and meant that quite a bit of emulation could be done with no loss of systems speed).

FPGAs overlap on some of the same territory, but with different tradeoffs — and they had a long development curve: it took quite a few years for the combination of enough gates combined with high yield optimized “soft wiring compilers” to get FPGAs to be practical materials. One of the tradeoffs is e.g. slower clock speeds but immensely more possible parallelisms and comingling “processing” with “memory” …

Bottom line for me today: it makes a lot of sense to have hybrid systems that combine both fixed and variable low level functions (for example: Intel has a rather large line of offerings which combine an ARM-like processor with a fair amount of FPGA real-estate).

If threads were allowed to share the stack memory, what complications will this create? Please make a list and discuss it.

This sounds like a homework assignment, and — if so — I don’t think you should try to get Quorans to do the work for you.

Are there any university math departments headed by a computer scientist?

Good question (and I don’t know offhand). But there are any number of computer scientists who are deep enough mathematicians to qualify (e.g. Don Knuth, etc).

Why is it hard to make software compatible for multiple operating systems?

It actually isn’t very hard if you are willing to put in some extra work and care (and these extras could be considered “hard”, but they are up front prep).

A classic route was to make a virtual machine that is hardware independent by using an implementation route available on all machines — for many years, that would be a subset of C (where it is used as a “universal assembly code” target for the compilers you have made).

Then you ask about the minimum you need from an OS (rather than the opposite) in order to work on a particular machine and OS. This is usually access to signals coming in — especially from the UI — and to be able to hand off a bit-map for the OS to display on the screen. You will have written your own file system, but you need to map it to the one the OS actually uses.

When Squeak Smalltalk was done ca 1996 it took volunteers a few days to a week to get the system running bit-identical to all the other Squeaks on other platforms. In practice, the “compatibility code amounted to between 1,000 to 2,000 lines of C.

Squeak Smalltalk was written in itself and was packaged as a virtual memory for a computer. The lowest levels were written in a subset with a translator into C.

Bootstrapping can be fun and easy if it is part of the overall plan for doing software. It is often quite difficult to try and bolt it on afterward.

A key idea in first bootstrapping an interpreted VM is that once you get the system going on the new platform it can then supply tools for any needed extra optimizations.

Now there is a second route, which is to use JavaScript as “an assembler” target of one’s own compilers. The architecture of the browser is “a bit random”, but the JavaScript implementation is highly optimized, and the storage allocator is good enough. If JavaScript were more reflective, then one could write much better debuggers. As it is, using it as a “machine”, one can put more felicitous languages on top that do have much better debuggers, etc.

I think this makes a lot of sense today because — with all its flaws (and there are many) — the browser environment is the only automatic universal deployment environment available on the Internet right now, and there are more and more examples that show that the optimizations of Java, and the possibilities for WebAssembly when needed, make this a decent target for one’s own systems.

What did Jon Von Neumann contribute to computers in contrast to Alan Turing?

Alan Turing in the mid 30s explored computability*, which was triggered by interest in Goedel’s demonstration that most interesting systems of logic were not completely decidable.

Turing did this by inventing a thought experiment that could do the simple human actions of following a limited form of recipe and writing and reading marks on a “tape” (like a long strip of paper).

The most important of the results showed that a very simple machine was capable of simulating any machine (this is called a “Universal Turing Machine”, and many very simple versions were subsequently devised).

Von Neumann was one of the great mathematicians of his day. During WWII both he (in the US) and Turing (in the UK codebreaking project) separately got involved with mathematics and machines that could “do math”.

It wasn’t just von Neumann who realized that Turing’s thesis model actually could make a lot of sense as a physical computer. This because the equivalent of the “tape” (some form of “memory”) — can be a lot cheaper than the recipe following hardware — which could then be made as small as possible because more complex computers could be emulated.

But von Neumann’s influence in pushing this idea was very important in getting some of the first programmable computers to be developed after WWII.

(There had already been several examples: in Germany by Zuse, and in the US by Atanasoff-Berry).

And in the US progress was slow enough so that Maurice Wilkes from the UK who had attended the seminars in the US, decided to “just do one”, and this resulted in the EDSAC at Cambridge ca. end of ‘48 (which is my vote for the first available programmable computer**)

In practice, the “simple logic” part of these early computers was made much more complex than Turing’s theories demanded. This is because everything was really s l o w , especially given the memory technologies available. This got the designers to want to put more into the fixed part of the machine and do less emulation using the “tape”.

This mismatch of speeds is still the case. The intermediate ground of making the fixed part of the machine be an emulator has been a very good solution (it made the work at Xerox Parc possible), but is oddly not found in most commercial CPUs. This has affected many things adversely, including software development, design of higher level programming languages, etc.

---

See the helpful comment below by Robert Langner, which includes a link to a short history of what Turing did in the 30s.

** The Manchester “Baby” worked just before EDSAC, but I think it was too rudimentary to count — it was a test machine. (But see Clem Coles’ harummph below.)

The Manchester Mark I worked for the first time a few months after EDSAC. (Not that priority means anything, but there is still the friendly rivalry between Cambridge and Manchester …)

What are old technology systems referred to?

Normal

(“Technology” is all that stuff that happened after you were born …)

Why do people often attribute inventions with only a single genius?

Joseph Campbell’s book “The Hero With A Thousand Faces” provides part of an answer. In this view, humans are not only genetically wired for language and stories, but we might even have some story tropes partly built in. In any case, cultures can preserve tropes, genetics or not.

One of them is “The Hero’s Journey”. One of many reasons why this appeals is that it simplifies the problem of making up a story and telling it, by concentrating the action on a single hero. It is helped that there are “prime movers” in every society who have a bit more “oomph” to get things to happen.

Journalists love “The Hero’s Journey” trope, and so do most people.

In telling stories about Xerox Parc, I have moved back and forth between the really amazing individuals who made difficult things happen regardless — Butler Lampson, Chuck Thacker, Dan Ingalls, etc. — to trying to show just how much of the ideas had important prior art in both the ARPA community and computing in general, and also the large process influences from WWII and Cold War projects that we learned.

Still, the sequentiality of stories and the limitations of short term memory cater to simplified versions. Real history is a big tangled wall chart of influences.

And … though e.g. Newton somewhat required the 17th century and some of the past, what he did by himself is so amazingly staggering that you have to consider him a single genius.

What term describes an instance of a program being executed?

In antediluvian times it was called a “process”. I think terms have gotten mixed together and less clear these days …

Which book has provided you the most tangible benefits in your life as a computer scientist?

Most of the writings that got me going early on (starting early in the 60s) were papers.

People who read a lot can never come up with “a single book” for anything.

I’ll just stick to the 60s. Books whose ideas and perspectives had a huge influence on me back then with regard to computing included:

“The Molecular Biology Of The Gene” (1965) by James Watson (I was learning molecular biology, and this spilled over into computing after I started grad school in 1966). It’s all about being able to make really complex things from gazillions of simple parts (both the complexity and the scale were far beyond computing, and this helped a lot, as did some of the solutions of Biology).

“Understanding Media” (1964) by Marshall McLuhn. I didn’t understand it when it came out, but spent the summer of 1967 learning how to understand McLuhan (partly by starting with “Gutenberg Galaxy”) and this was a biggie. What McLuhan was driving at is one of the most important perspectives on human beings.

“Sketchpad — a man-machine computation system” (1963) by Ivan Sutherland. His thesis — so it counts as a book — probably had the largest effect on me vis a vis computing when I first read it in 1966 (and to this day). It is the epitome of so many important ideas, perspectives, aspirations, etc. in computing.

“Semantic Information Processing” (ca 1963) ed. by Minsky. This essentially was digests of the first wave of AI theses by Minsky’s students plus several important papers by McCarthy and Minsky. Some of them were revelatory (e.g. the Thomas Evans Geometric Analogy solver, and the first McCarthy “fluents” paper).

“Computation: Finite and Infinite Machines” (1967) by Marvin Minsky. A truly lovely book —- another “kind of how you should be thinking about things”. Automata theory was already off the main track in computing, but this book was so beautifully done (and I also had a UG degree in pure math, so … It also got me to look at:

“Lisp 1.5 Manual” (1964) by John McCarthy et al. The front part of this is revelatory and struck me as “kind of how things should be done”.

“A Programming Language” (1962) by Ken Iverson (I read it in 1967). APL had not yet been implemented, but this book was hugely influential on many computerists. A really important different perspective on programming. Put this together with Lisp and Simula and Sketchpad, and you can loosen up your thoughts about programming.

“An Algol 60 Compiler” (1964) by Randell and Russell had the wonderful ideas — which were pulled off with a great implementation — of having models of programs and to have parallel semantically exact implementations for different purposes (e.g. an interpreter for programming and debugging, an optimizing compiler for production runs). This fit nicely with the advent of CAD and SIM (with Sketchpad and other early systems) along with “automatic FAB” to get a final product, that was the kind of thing that engineers concerned with computing could see happening to their fields (including the field of computing). Ironically, to this day, computing in general has done the least with CAD<->SIM->FAB (except in the remote reaches of computing research in ARPA and Parc, etc.).

Are there necessary compromises between making a computer a consumer convenience along with being a tool for augmenting intellect?

One way for you to answer your question is to consider the technologies of reading/writing and their extension into the mass world via printing. If you start with the larger world of “augmenting intellect” via reading and writing and printing, it is very easy to also use these tools for “consumer conveniences”.

If you start with consumer conveniences, you will have a much harder time along many dimensions to get to the structurings needed for the larger ideas.

I think this applies across the board for computing. A bunch of us put in a lot of effort in the 60s and 70s to make a version of “the large ideas” that could span the world and be a very new kind of “book” for everyone.

Some people use this full out. Far too many never realize what they are missing (this is because computers can also simulate lots of stuff that is actually “anti-augmentation of intellect” (namely “diminution of intellect”) e.g. Twitter).

How did people at PARC think about AI?

A short answer here, and I’ll likely get back to this for a longer one.

Parc had from the beginning some of the top people in AI, including Danny Bobrow, Terry Winograd (a consultant), Mark Stefik, John Seeley Brown, Richard Burton, Ira Goldstein, and many more.

A number of impressive systems were done, including KRL, Steamer, etc.

How was the first computer operating system made if there were no programming languages?

There are two ideas here (a) “program” and (b) “computer”.

Programs have been around for a long time as directions to be read and carried out (early on just by humans). Heron of Alexandria (ca 30 AD) made a variety of mechanical devices that were controlled by “programs”, etc.

A computer is a machine that does work on information. You can make a simple adding computer with two rulers:

image

This shows the addition of 2 5/8ths + 15/16ths with the result: 3 9/16th

To get a programmable computer we need to make a machine that can read a program of some kind, and follow its directions.

By this definition, the Jacquard Loom is a wonderful hybrid computer: it works with information but its results are woven patterns in cloth.

The Jacquard Loom was controlled by programs punched into a sequence of cardboard cards.

image

The cards are strung together and read in sequence. The program controls which threads are lifted in the loom part to the left.

Programming can also be done by “plug-boarding”, where the program is given in terms of connections of wires. For example, here is how a punched card accounting machine was programmed:

image

This was actually quite fun to do

When most people say “computer” today, they mean a “stored program computer”, where the program is represented in the very same memory structures as the “data”.

In operation, both the data and the program are usually represented as a “word” of binary digits — each one “on” or “off” — perhaps 16 of these. If you look at all the patterns for 16 on-off things, you will get 65,536 of them. You could decide to interpret these as positive integers, or as ±32,768, or as two or three text characters, etc.

And you can also decide that the first 3 binary digits could represent 8 different kinds of operation for the machine to carry out, and the remaining 13 digits could be the “addresses” of 8192 of the binary digit words stored in the “memory”.

Then a program would be a sequence of these operation words held in one place in memory, and part of the hardware of the computer would look at a word and do the operation indicated, and then go to the next word and do that operation, etc.

The result of the operations would be to get data words from other parts of the memory, do the indicated operations, and then put the results back into memory.

I went through this in some detail in order to answer your question.

I think you can now see that “programming” is quite independent of having a machine to program. A program — in the terms we are using here — is “information” that “describes” a sequence of actions that can be “interpreted” by processes (could be in human brains or some form of machinery).

In our example above, we can program our first computer by hand, by directly writing a sequence of 16 digit words with the on-off notations — a mark or not, or a hole or not — that correspond to the instructions we want to have carried out.

These can be entered into our computer using switches on the panel, punched paper tape or cards, etc.

A somewhat different answer I wrote a while ago might also help:

How were computer programs and languages like COBOL installed into computers back in the 50s before software?

Is there a programming language that uses past and future tense?

Some very confident (why?) answers here about “no, there aren’t programming languages with past tense”. Or “there isn’t any reason for such a thing”.

But take a look at Lucid, by Wadge and Ashcroft. Its variables are histories of the values, so the various kinds of past tense can be used. Lucid (programming language) - Wikipedia)

Also, there have been a number of experiments and proposals for “world-lines” in programming, especially “good old real AI” programming. These go all the way back to John McCarthy’s “Situations, Actions, and Causal Laws” papers in the 60s. A “situation” is a past whole state of a system, and it is reached by a “past tense” operation.

These ideas are important on a number of fronts (for example, think of the many ways that a “general UNDO” at any granularity could vastly help programming and debugging). Both databases (usually) and user interfaces (sometimes) have unlimited undoes and looks at the historical past, but this is also very useful for finer grained programming, AI, etc.

Histories and tenses can be implemented in languages with a decent meta-framework (e.g. Lisp, Smalltalk, etc.). Here is a paper about such an experiment: http://www.vpri.org/pdf/tr2011001_final_worlds.pdf

What did you think of How Did We Find Out About Computers by Isaac Asimov?

I like Asimov, but this book doesn’t explain anything that would help a child learn how computers work — how to make them, program them, get them to do a few semi-intelligent things, etc.

What is the story behind Alan Kay’s adage “Simple things should be simple, complex things should be possible”?

I think I came up with this slogan at Parc during discussions wrt children, end-users, user-interfaces, and programming languages. Chuck Thacker (the genius behind the Parc hardware) also liked it and adopted it as a principle for many of his projects.

So e.g. Smalltalk needed to work with children and end-users even more intuitively than (say) JOSS or Logo. But we also wanted to write the entire system in itself, so that those who were curious — especially later on — could “pop any hood” in the system and see a live program/object written in exactly the same terms as what the children were learning.

Similarly, the GUI had to be easily learnable by children, but — looking ahead — it had to handle “50,000 kinds of things we hadn’t thought of done by 50,000 programmers we hadn’t met” and be as simple as possible.

Another part of this was that we were determined to have a very easy to learn UI would also incorporate end-user programming (scripting) as a natural part of it — in other words to combine what had to be simple yet possible with the programming language with what had to be simple yet possible with the UI.

The general zeitgeist against this idea — both back then and now. Basically: those artifacts that do simple things usually wall off next levels of complexity, and those that do complex things don’t do anything simply.

But, given that there have been some really good examples of how to do both, it’s hard not to see most computer people as (a) not caring, or (b) being lazy or unskilled, or (c) both.

It’s worth pondering this. One argument against mine, is that “people need and want ‘appliances’ “ that only have one function.

I would say, “Well, that’s the simple part, now show me how the next idea and need the end-user has that is not directly in the surface appliance can be realized” (this is the enormous difference between a computer artifact and anything made from atoms — and it means that the analogy to appliances is not a very good one). There is just no reason not to allow and make possible safe “hood popping” …

As a software engineer, if you were the professor of an "Introduction to Programming" course at a college, what programming language would you choose to introduce newbies into the world of programming? Why?

For me, this question is out of the scope of a Quora answer.

I’m a full time researcher, but also an adjunct professor of “CS” at UCLA (which means I don’t have to go to faculty meetings, I can teach a class when I want to meet some students, and I can choose to work for free (which I choose to do)). I usually will do something in grad school or upper division (and usually with other professors).

The problem with teaching a “101” course in most places in the US is that there are enormous demands for the course to be a prerequisite for the courses to follow. Nothing that I would choose to do would satisfy most (or any) of the constraints.

But I think a good way to approach this is to choose “project types” and look at various ways to handle the project — first, if humans had to do the project, and then to use a “pseudo-code” write down the strategies and tactics.

Some projects are like assembly lines, some are like simulations of systems, some are like “constitutions”, some are like “ethics”, some are like mathematics, some are like building a bridge, etc. Styles of programming and languages have evolved to try to supply better tools for each kind of problem area.

I think having many perspectives at the beginning is critical because many people seem to be like Lorenz’s ducklings and think the first programming language they see is “their momma” and then follow this around for the rest of their lives. This is a bad idea even if the first programming language is on the “relatively better” side of this shifty tribe.

Many programming languages are rather like the underlying hardware, and this is a distraction from the big idea of computing: if you have a computer you can make any kind of computer that will better serve your needs. This is a meta-idea rather than a “slogging-idea”. It is the one least heeded by most programmers in industry.

Since any computerist who should be teaching about programming — and especially “software-engineering” — should know enough to make a pseudo-code run (especially on top of some dynamic substrate e.g. Javascript), I would make a programming environment for the class in which the different kinds of strategies and tactics can be explored.

One of the most fun classes I was involved in a few years ago at UCLA was with professors Tod Millstein and Alex Warth, in which we used Alex’s “language generator” — called Ohm — to quickly make up programming languages for different styles (think SICP but with nicer looking things to work with, especially for beginners). The resulting languages were interpreters but ran plenty fast enough to explore a wide range of projects.

The students for this class were not beginners, and the class was about inventing, designing, and making languages, but the tools involved could easily be put to the purpose of making a pedagogical language for beginners.

If successful, the students would come out at the end with many different perspectives on programming, be able to do them to a reasonable extent, and (I would hope) also with a profound dislike of most of the existing programming languages in the world, especially the most used ones. Some of these students will go on to invent much better ways to program (as opposed to the mass of programmers who think coping with misfit tools is being “heroic” (instead of “yikes!”).

In other words, I don’t think the primary purpose of college is for vocational training, especially in programming (there are trade schools for just this purpose). The main purpose of college is to provide enough new and different perspectives to raise one’s level of thinking both overall and with regard to the subjects taught. And this should especially be the purpose of college in the first few years.

Is the second best way to predict the future to fund it?

Note that the easiest way to predict the future is to prevent it. Or do nothing. Then the future will be like the present and the past (as it was for hundreds of thousands of years after humans appeared on the planet).

And that the phrase “the best way to predict the future is to invent it” is about willpower more than goodness of goal (the politicians in Washington are inventing the future every day, and it is a bad one).

There is plenty of funding around, but what’s required is the kind of funding that was typified by JCR Licklider at ARPA and Marvin Dennicoff at ONR.

I saw the first half of 2001: A Space Odyssey, but it was annoyingly slow paced, and I didn't see the need for that "Dawn of Man" sequence. Is it worth watching the rest of the movie?

In Business, “the customer is always right” (this is why we buy/pay) — a large part of the process is to “find a desire and fill it”.

In Education “the learner is usually wrong” (this is why we learn) — a large part of the process is to help the students to grow their minds to be larger than when they started.

If a person going to a movie or theater or buying a book, etc., considers themselves a “customer”, they will be upset if what they are buying is not what they wanted.

If they consider themselves a “learner”, then they will be willing to go through a much longer process to see if learning from “an unusual exceptional mind” will be wonderful after they have changed enough to be able to deal with the ideas.

Today is a tough time for Real Education. Children grow up as consumers/customers and most remain that way as adults. The mass media and consumer industries cater almost completely to satisfying desires, and almost not at all to growth.

The universities are now businesses, with students as customers, and students choose what to be certified in. The pace is set for those things that can scratch an itch quickly without any new learning.

Socrates didn’t charge for his teachings because he would then have customers rather than students …

All this hasn’t put real education completely out of business, but it happens with a very small percentage of the population (generally too small a market for mass providers to be interested).

Another movie that has some of the same approach and pacing, etc., and is also very much made more for “learners” than “customers”, is Ariane Mnouchkine’s “Moliere”. What she is trying to help us with can’t be done like a “customer movie”.

One way to help think about this is to consider the difference between normal prose and poetry (why is poetry the way it is? why put readers through an experience that is not like the ordinary oral discourse they are used to? Etc.)

Or to think about what does it mean to try to do what real theatre does in a movie? (Especially what Greek Theatre does?) Some of the elemental contact between actors and the human condition and the audience is lost — how to get back the contact?

Mnouchkine was one of the most brilliant theatrical directors in France (and her father was a famous French film producer and actor).

Her solution was to get back the elemental contact and transport by using immersive visual language and design — to the point that you can almost smell where you are — as the poetry and to almost eliminate spoken dialog — this is done astoundingly well and for almost 5 hours (many who have seen this movie — including myself — have reported feeling quite sad when they realized that it was coming to an end).

2001 took much of the same approach 10 years earlier, with, in many ways, a more difficult set of ideas to try to get below/beyond regular thinking.

Quite a bit of great art is dealing with “the world is not as it seems”, and the artists try to find ways to get beyond the quick rejections of ideas that violate what people believe and that which remains invisible to them.

What aspects of Lisp influenced Smalltalk?

I’ve written about this variously elsewhere: including in “The Early History of Smalltalk”, and I’ve emphasized especially that “the idea of Lisp” pervades the underpinnings of Smalltalk as it was developed at Xerox Parc.

For me, it was spending a Sunday afternoon in the late 60s tracing through John McCarthy’s eval-apply for Lisp that rotated my perspective to thinking about computing “that way”. This included how late binding everywhere and finding a universal composable and embedable computing element could astoundingly reveal “hidden simplicities” in programming, its design, and implications for programming languages.

This got me to go back and really look at Peter Deutsch’s PDP-1 interactive Lisp in the Bobrow & Berkeley book “The Programming Language Lisp” — the entire PDP-1 assembly code was included in the paper, and Peter had realized that if you’ve got a really good live language, that you don’t need a separate OS. I was using a very similar approach for the Flex Machine personal computer, and the Flex language was derived from Euler, an extreme generalization of Algol by Wirth that came close to being a Lisp.

(At Parc the several versions of Smalltalk that we did used internals that were variants of the internals of both Lisp and Euler.)

The peculiar organization of Lisp — once you got used to it — made deep fundamental things easier to think about than Euler (and the general Algolish approaches of the day).

One thing that stands out is that about half the tiny McCarthy interpreter is only there because he was thinking about functions that evaluate their arguments before the lambda form is “applied” to them. This requires forms of quoting to delay evaling for the “special forms” of QUOTE, COND, LAMBDA, etc. But you don’t need that, and the MIT Lisp 1.5 people had already introduced the idea of an FEXPR, which doesn’t eval its arguments at call time, but simply is handed the argument list. The FEXPRs can then eval parameters if needed and desired (this allows e.g. “COND” to just be a vanilla FEXPR).

Also, Lisp raised the sibling question in a thinkable way about passing a lambda with free (global) variables.

What does this mean in the new environment? Look for them dynamically in the new environment? Can be useful.

Bind the globals to their values in the calling environment and package the lambda with its own a-list? Very useful (and very like a Simula-style object).

Search for the values of the globals dynamically in the calling environment? Also very useful, and rather like the thunk ideas in Algol for “calling by name”.

Lisp also had a second set of interesting ideas that were “loosely” (meaning “not really”) integrated into the rest of the language: the property lists on atoms. If you were thinking “Sketchpad” or “Simula” thoughts, and looked at Lisp, you would see some of what you wanted in closures and some of what you wanted in property lists.

The term “object” in the 60s meant “a compound data structure with fields”, and this was another way to use property lists. Since you could stash procedures on the property list along with anything else you wanted, you could make a whole little world. If you had the procedures have a global variable that contained a pointer to the property list the procedure was part of, then quite a bit could be done.

Interestingly, with this aspect of Lisp, the way it was done made it harder to “think good thoughts” than was helpful.

The offspring of Peter’s PDP-1 Lisp was BBN Lisp (1.85) and it supplied “all of the above”, and much more.

Another interesting dynamic back then was Warren Teitelman’s thesis “Pilot”, his heroic pass at McCarthy’s “Advice Taker” via meta programming rather than AI. Many of these ideas were incorporated in BBN Lisp when Teitelman came to BBN. (This and more became Interlisp.)

And then there were more and more really interesting AI systems — many of them thesis projects — that were done in Lisp.

I thought that the key principle here was that Lisp allowed you to learn more — in part by being able to try more and change your mind more — than any other programming system. This was especially the case when you needed something that the bottom of the system didn’t have — the meta reflective properties of Lisp gave you more recourses to still do something that would help.

In a world in which almost nothing was known about programming — even by the best programmers — this was really huge.

I don’t think a lot more is known today about programming than then, though Parc did make a few important advances. But many of the advances happened precisely because Smalltalk manifested its ideas about “instantiating intercommunicating processes” by using Lisp techniques. If I had to do it again, I would retain much of this approach, but would do a very different design in place of “Lisp sharing”.

What does Alan Kay think of the proposed 50th anniversary celebration of Smalltalk in 2022?

I think especially celebrating Dan Ingalls would be a key point in a 50th anniversary celebration (if it weren’t for him, the various Smalltalks developed at Parc would not have happened).

image

Secondarily, I would celebrate Chuck Thacker, as the main designer and builder of the Alto computer, on which the first Smalltalks were possible and made to run.

image

There is a lot more than cliched “engineering” here with both these two guys — it is a bit difficult to adequately explain the combinations of art and technique and will — and real genius — that allows people like these to transcend what’s around them.

I take this perspective because I see computing as something with so many different kinds of degrees of freedom and self-referencing that defy mathematical type proofs for most promising ideas. And, if the topics are “personal computing” and “software science & engineering”, real artifacts are called for, not just to vet ideas but to also use them in ways deeper than philosophical mumblings.

The actual realizations of “thought breakthroughs” in this end of computing has required groups of highly able talents of many kinds to be able to cooperate on projects larger than are possible by a single person.

This means that the celebration should include both the precursor individuals and groups, but also the extremely important colleagues who operated in the magnetic fields of Ingalls and Thacker.

This makes for a history that is more complicated than the story forms that we humans like to use (and restricting the history to story forms — as has been done by most writers — has had the unfortunately side effect of removing some of the “richness that should not be removed”.) This has been the case for other large breakthroughs in computing (e.g. Engelbart’s visions and their important partial realizations by hardware people led by Bill English and software people led by Jeff Rulifson).

Finally, I would celebrate the ARPA community processes (and those of Parc which grew out of these), which made doing this research “happy and easy”, meaning: it was only about finding breakthroughs that enabled and progressed Licklider’s vision: “The destiny of computing is to be interactive intellectual amplifiers for all humanity pervasively networked worldwide”.

There’s nothing happier than working on a truly great vision with great people, and nothing easier than to solve really difficult problems when all the time you have can be spent thinking about and approaching and redefining them in any way you choose.

Why did almost no one in the US know about Charles Babbage in the 1930s, as mentioned in the book The Dream Machine?

Alexis de Tocqueville wrote about Americans in the 1840s: “They have no past and no future, but live in an extended present”.

I think it’s quite fair to say about most Americans that History of almost any kind is very far from their main interests or past studies.

Are modes a feature or a bug in interface design?

Quite a bit of early user interface design was done by the ARPA community starting in the early 60s — as an outgrowth of JCR Licklider’s vision: “It is the destiny of computers to become interactive intellectual amplifiers for all humanity pervasively networked worldwide” — after he was asked by ARPA to fund projects to further this vision.

It was soon discovered that “modes” — being in a particular set of states and not others — were useful when you wanted to be in a particular state of affairs (like typing text in the middle of a paragraph, but painful when you wanted to do something else, and forgot you were in the mode and had to get out of it).

Tom Ellis and others at RAND invented the first really good tablet, and decided to make an interactive system only using a single stylus. This got them to invent the first good gesture/character recognition interface, and then the more comprehensive GRAIL system (GRAphical Input Language system). Many of the interactions of this system were “modeless” (in the sense that the system would let you do whatever you were doing, but would then get things back to being able to do anything). Many insights about what was possible with a good pointing device and a display came out of this system.

https://youtu.be/QQhVQ1UG6aM

By contrast, the Engelbart system, which had many other important features, and included a pointing device, was hugely modey. This was one of several reasons it was difficult to learn. For us at Parc, it was a non-starter when we included children as prime users within personal computing.

At Parc we decided that “modeless” didn’t mean no modes, so much as it meant “you can always do the next thing you want to do and the system will exit you safely out of anything you might have been in that was helping actions you had been doing”.

For example, in the overlapping window UI, what was in a window was a mode, but at anytime you could touch another window and it would come up to the surface to be the new focus of attention. (Everything was a “window”, including buttons and icons, etc.)

As UI got to be more studied and designed, we realized that quite a bit of one important dimension had to do with “cognitive load” (managing the small number of “chunks” to which we can pay attention at any time — the classic “7±2” (which is really more like “4±3”).

The small number of chunks means that we can forget both our current state and often future options when we are engaged in something, and it is one of the jobs of a good UI to help the user with this.

So we need to find ways to allow “the next thing thought about” to be done. We need to organize things so there are options for “next thing” when the user gets stuck. Etc. For example, it really helps to have a visible menu of options just for this purpose (menus that can appear from a function key are often forgotten as an option when the user is under stress, etc.). *Note how bad the Quora UI is for someone trying to write an answer here: they crazily have tied the meager editing menu to the top of this scrolling text rather than having it float to the side or having always appear at the top outside of the text — so I can’t see the options right now — this is very typical of the generally terrible UI schemes found in the web and web browser …).*

One of the big helps was the realization that the “object-oriented languages” being experimented with at Parc that had the syntax of <object><message> also furnished a way to think about modeless in a UI: select an object or objects (which will illuminated menus of messages possible), then select the message you want to sent to the selection. This is quite modeless: it allows all kinds of further selections after the first selection, including other objects that are not menus of messages. We loved this identity between the visible and the symbolic parts of the Smalltalk system.

A big insight was by Larry Tesler — and independently by others — that if pointing at text was interpreted as pointing between characters rather than at them, then any selection possibility would be just a “zero to n selection” (with zero putting the selection in between two characters). This meant that any typing action would be modeless — there was no longer an “insert mode” that eventually had to be gotten out of: all typing was “replace the selection”.

Another idea that was a venerable way to institute modeless was a “meta key” that could produce a “fleeting mode”. For example, the shift key on a typewriter keyboard was a “fleeting mode”: just getting off it to do something else terminated the mode (as opposed to the “caps lock” key which toggled, and thus introduced a real mode). Many early keyboards in the 60s had lots of different fleeting mode meta keys, and most keyboards today retain a few (such as command, control, option, etc.).

One “mode” that caused a lot of discussion in the early days was “painting/drawing”. We had done one of the earliest good drawing systems in the early days of Parc to make use of the early bit-map displays. There is no question that “drawing wants to be a mode”, you are doing lots of strokes one after the other and you want to stay there for a while. The first such system we did had a visible menu of tools for brush size and gray tones, and these were modeless (the system could detect when you were out of the drawing area and in the menu area). You could use a fleeting mode key to give commands via gesture recognition, etc. The tools showed as the cursor to remind the user about the mode they were in, etc.

One of the experiments was to allow painting/drawing to be “shaken off” by a violent gesture. Note that this was still “modey” — you were explicitly getting out of the painting/drawing mode — but it was at least in the psychic space of tools on the cursor.

A really big annoying mode was to have changes in media (text, drawings, etc.) be permanent. Warren Teitelman had been experimenting with comprehensive UNDOing in UI in Lisp 1.85 (later Interlisp), and most Parc systems incorporated quite a bit of UNDOing into their UIs for both programming and media manipulation. Apple did a good job of following this up for the early Lisas/Macs.

It’s worth noting that quite a bit of modiness has crept back into UIs in computer systems today. I think this is from a combination of laziness and ignorance on the part of so many developers today.

Virtually all real modes are a bug in UI design.

Are there problems where a solution by symbolic computation (using only computer algebra regardless of difficulty or time required to solve them) is absolutely impossible that can only be solved with numerical methods? What are some examples?

Many non-linear problems don’t have algebraic solutions, and numerical methods have been used starting long before computing machinery to help. For example, most orbital mechanics, perturbation problems, fluid dynamics, etc.

What are the most important elements of computing history not widely known by current CS graduates or working developers?

I think this question is much too large for a Quora sized answer (in part because of a combination of a lack of curiosity in the field generally, and a general lack of emphasis on history in schooling).

Just to pick four here in no special order — and from about the same period (~ 1962) — how about:

John McCarthy’s approach to computation and reasoning that allowed large complex interrelated “situations” to progress forward in states but retained the ability to use logic to reason about causality and relatedness (“fluents” etc.) Adopting this idea instead of the much weaker notions of “semaphores” would have made a big difference in every part of computing including much of programming, systems designs, and AI.

Anyone in computing should have read and understood everything that McCarthy did and wrote (this doesn’t mean he was right about everything, but “interesting people are interesting whether they are right or wrong”, and John was one of the most interesting in his field and century).

cf. McCarthy, J. (1963). Situations, actions and causal laws. Stanford Artificial Intelligence Project: Memo 2. Also in “Semantic Information Processing” (Minsky ed.).

Doug Engelbart’s actual goals and plans (as expressed in his original proposal to the Air Force in 1962). Understanding and acting on this would have allowed Moore’s Law revolution into personal computing to have taken full advantage of the deepest ideas about it by Doug, and some of those after him who developed these ideas further. (The current state of affairs is such a botch as to not even address most of the important ideas at all, let alone doing them badly).

cf. Douglas Engelbart (1962) Augmenting Human Intellect — A Conceptual Framework

Ivan Sutherland’s “Sketchpad” system — done in 1962 — which was not just about the invention of interactive computer graphics, CAD, object-oriented design and relationships, but also showed how programming really needed to be developed forward in computing, for all programming, and especially for end-users.

cf Ivan Sutherland (1963) “Sketchpad: A Man-Machine Interactive System”

Bob Barton’s “A new approach to the functional design of a computer system” (~ 1961) which showed how hardware architecture should start to seriously cater to the advent and great need for much higher level programming languages and environments. This led to the Burroughs B5000/5500/6500 etc but had very little impact on what most people thought mainstream HW and SW was about (and until this day quite unfortunately).

cf. Robert S. Barton: “A new approach to the functional design of a computer system”, Proc. WJCC, 1961

Note: All four of the above early sixties ideas and insights by out and out geniuses had a great influence on subsequent work in the ARPA community, and especially at Xerox Parc.

Is our capacity for problem solving limited by abstraction? (Particularly in Computer Science)

Marvin Minsky liked to say “You can’t teach calculus to a cat”. So, sure, it is extremely likely that there are problems (and solutions) we can’t understand or even see.

On the other hand, we already use what Jerome Bruner called “Goedelization” to greatly extend our biologically created brain (he meant that we can use our language facilities to build smarter “brainlets” and make them work for us inside our heads — e.g.: calculus is just such an example of us inventing and internalizing a kind of computer/mental-process to extend our abilities to think). The sibling of language — culture — can carry these inventions forward, and especially to young children in an even more powerful way (because they can more easily take on the epistemological perspective of the invention, not just the use of it).

These language and cultural extensions were greatly amplified — some of them likely caused/catalyzed — by the invention of writing, and eventually printing.

Many of the early pioneers of computing — McCarthy, Minsky, Licklider, Engelbart, Sutherland, Papert, etc. — saw that an interactive computer could make another qualitative boost — along the lines of language and writing and printing — both externally and internally.

Most of the world — both in general and in computing — have yet to catch up with and understand these ideas. Most adults — even in computing — use computers very weakly across the board.

The possibilities for children and the adults they will become have been particularly not understood (“quite botched” would be the technical term here).

Still, I think a better — if more cryptic — question for this would be: “Is our capacity for X limited by our capacity?”

I think the answer has to be yes, even with computers being able to take up more and more of the capacity load. At some point we won’t be able to be helped by what they might be able to do.

How do I make an AI program that learns?

Here is a very simple but comprehensive example. You can see what is actually going on in the learning process for “conditioned learning”, and you can deeply learn how it works by making it.

How would one explain AI to a complete beginner?

Who developed idea of using vacuum tubes in computers?

Take a look at the legendary Tommy Flowers, who was one of the earliest to make computing machines using vacuum tubes.Tommy Flowers - Wikipedia

---

(Prepend). It’s worth saying a few words about what came before. The “logic of true and false” was written about by Aristotle. George Boole showed there could be an algebraic form for logic in his 1854 (and second) treatise The Laws of Thought - Wikipedia. This was extremely influential.

Charles Sanders Pierce in the 1870s showed that what we call the “NOR” (neither … nor) function can make all the other logical relationships of two valued logic — and also remarked that you should be able to make a computer using electrical switches from these!

And there was also “the most influential master’s thesis in history” by Claude Shannon at MIT in 1937–8 …) which 50 years after Pierce, rounded out these ideas in a grand way (including circuits for a 4 bit adder, etc.). This leap from ad hoc methods to a much simpler much more complete way of thinking about these complexities made an enormous practical difference to subsequent work (analogous to the invention of calculus in the physical sciences).

The notion of using vacuum tubes for various computing functions is much older than the first whole computers to use them — for example: Eccles and Jordan made a “flip-flop” memory element from vacuum tubes ca. 1918. Flip-flop (electronics) - Wikipedia).

In another answer here, Petar Pandzharov mentioned John Vincent Atanasoff - Wikipedia in 1939 for an “almost a computer” (a pretty amazing set of inventions actually). These ideas greatly affected the later Eniac by Eckert and Mauchly.

---

The main problem with vacuum tubes was that they burned out all too rapidly, and this would be a constant problem in even small computers which would require several thousand of them (some would always be burning out). And then there was size, heat, power, annoyingly high voltages for the plate circuits, and the need for quite a few additional components to “help” each vacuum tube with regard to voltages and currents, etc.

Tommy Flowers in the 30s realized that the burnout of vacuum tubes (as with electric lights) came primarily from the thermal shock to the filaments, especially when turned on. He found that keeping some current moving through the filaments (never turning off the tubes) and “easing up the margins” (as they used to say) would keep them alive much longer.

Flowers was also an amazing combination of a genius and “be able to do everything well quickly” (my favorite “total-hardware-software-process-guy” in the Tommy Flowers tradition was Chuck Thacker at Xerox Parc).

These insights were key in the groundbreaking work Flowers did for the Bletchley Park codebreaking machines in WWII in the UK. When doubts arose about whether vacuum tubes could possibly be stable enough to be useful, Flowers argued not with words but just by quickly making and presenting the first of a series of vacuum tube driven computing engines.

(And, yes, these machines did use the older Eccles & Jordan flip-flop circuits in Flowers’ designs for the Bletchley Park machines, including for Colossus in 1943 — Flowers and the Bletchley Park people were readers and did not waste time trying to reinvent things that had already been long invented …)

How do programming language designers manage language complexity?

A specific rather than general answer. For Smalltalk at Xerox Parc in the 70s, the approach was very Lisp-like: create a universal building block, implement that, and then deal with the complexities of designing and making “useful features” i.e. everything that one uses in Smalltalk would be considered “library” in more feature-laden languages.

This did not banish complexity, but it did make it both more tractable and also more flexible as the learning curve progressed. It essentially spreads out the design of what programmers will use over much longer and deeper learning curves via building real systems in it.

In order for this to work, the underlying implementation had to include quite a few things that (say) a language like C does not supply. This can add complexity (but really good work by experts here makes a huge difference in the overall simplicity as presented to programmers).

And the collection of facilities presented to programmers (all of which are actually extensions to the simple general kernel) have to also be carefully designed and made by experts. The more powerful and easy the extension facilities the more careful and expert have to be the designers.

This could wind up in a mess, as with more conventional approaches. The two main forces that turned these properties into big wins were (a) the dynamic late binding and definitional scheme made the creation of a new kind of Smalltalk “relatively easy” (we did 4 main ones in about 8 years), and (b) we had a truly superb system designer and builder in Dan Ingalls, who both was a co-inventor of the approach, and could really make it work.

This allowed a lot of flexibility and many real things to be made, but then every several years, we could also take stock of what we thought we’d learned and compare this to the gracefulness of the language at that point. When there was enough of a mismatch, we would do whole new system (but build it using the old one as environment until it could be cut free and stand alone).

How were computer programs and languages like COBOL installed into computers back in the 50s before software?

There was always “software” for “programmable computers” because this was the reason such computers were made (my favorite “first” is Maurice Wilkes’ EDSAC in 1948 (the Manchester people would quibble a bit).

There would usually be a “bootstrap” sequence of (not a lot of) instructions that the CPU could be pointed at. Sometimes these would be keyed in from the console, sometimes they would be in the form of a ROM of some kind (diodes, switches, wires, etc.). These instructions could just read a peripheral of some kind (paper tape or card reader) to get a few more instructions into the machine, and it would then execute them to get more happening.

These would then be enough to read the rest of the tape or deck of cards to get the program and then data for the job to be done.

For example, here is just such a “dead start” panel on the Control Data 6600 ca 1964–5 using manual switches to set the bits of the instructions:

image

A close up of this panel:

image

The 6600 had 60 bit words, with multiple instructions for each word. But it also had 10 “peripheral processors” for I/O etc with 12 bit words and instructions. The “dead start” code above would be executed by one of the PPs, and this would then continue the bootstrap sequence indicated by these 12 instructions.

The first COBOL was done by Grace Hopper, Harold ‘Bud’ Lawson, etc., for the Univac II computer, which only had 2K words of RAM — 12 decimal digits a word for data, and 2 6-digit instructions per word for programming — and could have quite a few magnetic tape drives. It took about 60 “passes” using the tapes as secondary/swapping memory, etc., to compile a COBOL program.

So: the COBOL (and FORTRAN, etc.) compilers never got actually loaded into memory, but just the parts of them needed for the parts of the compilation task being done. According to Bud Lawson, the first time an HLL program was compiled on two different machines from different manufacturers was in Dec 1960 for COBOL on both the Univac and the RCA computer.

Was computing dumbed down by the arrival of computer science in academia?

This is an interesting question! My first reaction was “absolutely not, just the opposite” … but this is the old fogey from the 60s reaction. This is because — pound for pound — by far the best “real computer science” * in the 60s was being done in the major universities around the world (e.g. in the UK: Cambridge, Manchester, Imperial, Edinburgh, etc.; in the US: Penn, MIT, Princeton, CMU, Illinois, Stanford, Berkeley, UCLA, etc.; as well as major universities in Europe: ETH, Eindhoven, etc.).

I then started pondering what has happened since computing in academia “went wide” — especially in the US — starting around a little before 1980, moving from a few top places — MIT, CMU, Stanford, etc. — to pretty quickly becoming a subject matter in the ~4000 or so degree granting four year college/universities in the US.

My thought at the time was “How can they be doing this? There aren’t anywhere near 4000 top people in computing in the world — actually many many fewer. Where are they going to get professors of high enough caliber to offer a real college level curriculum in ‘real computer science’?”

Part of the impetus for this rush was another transition that had started in the 60s: the catalysis of the baby boomers and other factors starting to move universities to entities much more like businesses. A good book about this is “Imposters In The Temple”.

This was in full swing and was part of the boom in “CS” departments, regardless of quality.

Another large factor in “business orientation” was a stronger transition in universities toward vocational training rather than “in depth understanding of complex issues” that was the hallmark of education. And part of this had to do with the differential lopsided inflation in parts of human lives (e.g. especially real estate), that got more and more students to aim more for jobs than enlightenment.

I remember being shocked when Stanford CS — over the protests of many of its professors — decided (I think because of “encouragement” from the provost) to have Java be the initial programming language for incoming students. This is an enormous university with an enormous endowment, but now catering to perceived pulls from business rather than having its own special take on actual “sciences of computing” and actual “engineering disciplines for software”.

I don’t know enough about the whole sector of university computing in the US today to answer the question directly.

My own samplings over the years — via talks and visits to many university departments — have given me the prejudice that the students most definitely know almost nothing about the history of computing, and especially computer science.

Right away, this is a clue that “computing is not being taught as a real field” (for example, in Physics you not only learn F = ma but who came up with it).

In this limited anecdotal survey, I found no student or professor had taken the trouble to type “Engelbart” into Google to find out what Engelbart had really done, what he really thought, etc.

This indicated a real lack of both perspective and intellectual curiosity.

It also really pissed me off (a bunch of us had put quite a bit of effort in the 60s and 70s to invent personal computing and world-wide networks primarily to make it easier to become knowledgeable and then enlightened … )

I found no student who could define “computer science” as anything but something like engineering. The term “science” was kind of like the use in “Library Science” — there was no connection to what “Science” has meant since the 17th century.

This deadly embrace is found in NSF (I spent quite a few years on advisory boards there). It deeply and negatively affects high school curricula. And now has invaded the elementary school years with really grotesque and substandard ideas about computing.

My answer to the question: It is quite clear that “computer science” has been disastrously dumbed down. I think academia can be blamed for a lot of the problem (because part of its job is “keeping the flame” preserved against the general dumbing down out in the world). I think business can be blamed for “being dumb”: to the Dunning-Kruger level of not thinking they are, and then inflicting this on the general population in various ways, including schooling at all levels.

From this old fogey’s perspective, it is really a shame.

---

Prompted by Will Razen’s comment below, here is how we thought about “Computer Science” in the 60s, when the term was put forth as an aspiration and a question, rather than a done deal.

Science is the attempt to detect and gather phenomena to explain it via creating models (theories) of some kind that produce similar phenomena, and to do this in ways that try to get around what is weak about our senses and abilities to think.

So, if a bridge is built, it exudes phenomena, and can be studied, modeled and better understood. I.e. there can be a “science of bridges” (and of “structures generally”). In the large, this creates “Sciences of the Artificial” ** i.e. sciences that spring up around the artifacts that animals, mostly we, make.

A wonderful thing about “The science of bridges” is that an improved understanding and better models for “bridgeness” can be in turn used to design and make better bridges, which have properties of their own to be studied …

The “sciences of artifacts” are a most delicious art form and pursuit for those who love and are called to the upward lifting adventure of understanding leading to making leading to understanding leading to …

Most of the sciences — whether of nature or artifacts — will use mathematics of some kind — often newly invented — to help with the modeling process. As in Physics, this shouldn’t be confused with the science part of things.

When Alan Perlis was asked what “A Science of Computing” could mean, he said it was “The science of processes; all processes”. He could as well have said “The science of systems; all systems” (he meant this to be the same answer).

This recognized that algorithms, etc., are a tiny part of what computing is about. Computing is really about understanding, inventing, and building systems. As in many cases in the past in science, when existing math is not up to the task, new math has to be invented. In this case, one of the needs for new ways to think about what’s going on stems from the degrees of freedom available and addition of the dimension of time.

The degrees of freedom and extent of dynamic relationships in desired artifacts, generally mean that they have to be debugged rather than proved. (And there are parts of math where proofs have some of the same quality — all proofs have to be debugged, some proofs actually require modeling them on a computer to debug them.)

Some of the earliest pioneers realized that the computer was “meta” in that it could be a great vehicle for modeling ideas about itself, so that much of the new math that was needed could be “extracted” from the “process space” itself. Many computer “theories” are models of processes written as running systems that can be debugged and explored.***

————

** see the book “Sciences of the Artificial” by Herb Simon (both a Turing Award and Nobel Prize winner, and one of the founders — with Turing awardee Alan Perlis of the CMU Computer Science Department).

*** We are sometimes asked how Xerox Parc could have been so inventive and productive in the 70s with just a few dozen computerists. One of the answers is lurking in the above. We thought in terms of systems of processes, made models of these, and ran these models on computer architectures we invented and constructed. I would call what we did the virtuous uplifting spiral of “Computer Science” understanding things to allow various “Computer Engineerings” to make new things, and etc.

Does the Chinese Room prove that true artificial intelligence will never be developed?

A less misleading situation is to ask whether there is something not “mechanical” — or something “intrinsically intelligent” — about atoms that accumulates into “intelligence” and “consciousness” when the atoms are formed into a human brain.

A little more than 100 years ago most people, including many scientists, thought about “life”: that it just had to be about more than organization of simple building blocks. This was called “vitalism”.

The biologists who did think that life was “only special organizations” — and did the work that started to show this — have been quite vindicated by the last 100 years.

The debates about being able to organize simple components into highly intelligent processes have followed much of the same lines. There’s no reason to go to the trouble of the “Chinese Room” as added confusion. Some people want to think intelligence is something so special about human beings that it transcends science.

To answer the question here: the “Chinese Room” proves nothing at all one way or the other.

The history of learning how life works is “very suggestive” that intelligence is the same kind of idea: special organizations of parts that do not at all have to be intelligent into systems that manifest intelligence.

And, perhaps more interesting — especially philosophically — “life” is suggestive that “consciousness” is also a manifestation of processes organized to be able to look at other processes.

A philosophical argument is “do we get intelligence and consciousness if we can simulate every atom in a brain?”. I think the answer is “Yes”, but that it is a lot more fun and interesting to consider inventing solutions that are closer to realistic engineering.

Lord Kelvin was famous for both his contributions to science (e.g. thermodynamics) and his unfortunate behavior of projecting his beliefs and desires into areas that are best considered by science and engineering (e.g. in 1895, he declared that “heavier than air flight is impossible”, etc.). Since birds are heavier than air, and do fly, he was presumably (we hope!) gloomy about the scaling of flight.

So: what we should be interested in here is the science and engineering of “intelligence” and “consciousness”. Like finding out how life works, it is quite likely both possible and feasible, and needs to be done to settle the issue.

From the practical standpoint, it is hard to imagine that solutions will not be more intelligent and reflective than human beings right from the get-go (we are actually terrible thinkers, given what thinking is all about). This would open up more questions, but I think pondering about this will require a lot more depth than Searle has shown.

What is an interesting and usual historical computer architecture? Was it definitely a dead-end or could it possibly make a come-back?

A really interesting “early architecture of great interest” was Wes Clark’s TX-2, the test computer at Lincoln Labs for the SAGE air defense system. This was also the machine that Ivan Sutherland used to invent a superset of modern computer graphics in 1962. One of several key features was its interleaved bottom level zero-overhead tasking architecture with about two dozen separate program counters, all tied to different conditions. This not only avoided normal interrupt processing, but also allowed many processes that were like HW peripherals to be emulated by programs (in fact, the graphics display and pointing device and knobs etc system that Sutherland used was creating by programming, and using the multiple PCs as though they were dedicated machines).

The classic answer here would be the Burroughs B5000 architectures — Bob Barton was the main inventor of most of the ideas on this machine — which used tag protected memory, polymorphic operations, “byte-codes” that were the semantics of higher level languages, safe multiprocessing (and multiple CPUs), etc.

A separate dimension would be architectures to allow emulation of high level operations as though they had been efficiently included in the hardware. In the 50s-60s-and-70s this was done with programming dynamically reloadable microcode.

At Xerox Parc we combined the three idea above in Chuck Thacker’s Alto computer, which had 16 program counters, zero overhead task switching at the bottom, reloadable fast microcode, which was used to both make display processes for the bit-map display, the GUI, etc. and also to make B5000 like emulated higher level machines for languages such as Smalltalk, Mesa, Lisp, etc.

Today such an architecture might be done with hybrid FPGAs (but done better than the current Intel offerings).

The basic key to combining flexibility with speed generally has much more to do with how the memory and processing units relate, especially at the lowest levels.

Is computer program portability more important in today's mobile technology?

I’ve always thought it was tremendously important in every era that I’ve been in computing. “Software wants to run on everything” — a huge percentage of computer people seem to bend over backwards to prevent this, but I think it’s a mistake.

Is it possible to program a word-processor or text-editor? How would I go about doing this?

Do you mean to make one from scratch? (They all have been made from scratch originally — more can be made).

Do you mean to have a programmable word processor? (E.g. Emacs is a famous one, Microsoft Word used to be programmable — I don’t know if it still is — it was quite programmable in its original version as “Bravo” at Xerox Parc.

Etc.

What thought process would lead one to invent object-oriented programming?

(Per Paul Hudson: “Maybe Alan Kay would answer. Since he did!”)

Well, OK.

In 1992, I was asked to write a history of what I actually did by the ACM for their second “History Of Programming Languages” conference. There are several versions online, including at Bret Victor’s site: “The Early History Of Smalltalk”. This should satisfy those who want more detail.

For the purposes here for Quora, (a) I did coin the phrase “object oriented programming” for what I was working on in 1966, but (b) I thought of what I was doing was a kind of “rotation” and generalization of an idea I had seen in a variety of forms since my days as:

a biologist

then as a programmer in the Air Force starting in 1961: a way to make a “semantic file system”

then in the hardware of the B5000 computer

in the large, as processes in “multi-programmed” and “time-shared” operating systems

most startlingly in Ivan Sutherland’s Sketchpad

in a more prosaic practical form in Simula

and as done with entire computers on networks (as in the upcoming ARPAnet by my research community).

What hit me in Nov 1966 was ridiculously, absurdly simple: “that, if you can have enough computers which can intercommunicate, this is all you need to define anything that can be done on a computer by any other means”. This is because each computer is universal, etc.

Putting aside the practical difficulties — I was basically a lazy mathematician, so it was fairly easy to detach from the industrious pragmatist I had to be as a programmer — this was very attractive as an idea, because it scaled much better than the traditional ideas of procedures and data structures as building blocks.

And it was so much cleaner, and so much more amenable to whole systems designs.

And so amenable to thinking in terms of “designing and programming simulations”. This last was from the two systems I saw within a week of each other that pushed me over the edge: Sketchpad and the first Simula — both of which were essentially set up to simulate complex entities containing many interrelated processes.

Another very attractive feature of having everything being made from “semantic software computers intercommunicating via messaging” is that some ugly properties of “data” could not only be fixed, but even eliminated. For example, standard hardware and programming languages which map closely to them can give rise to simple arrays, but not to “data structures” that are just a little more complicated — like sparse matrices — and stacks and queues, etc. These are easy to do if these are instead “processes” and get to decide internally what the “assignment arrow” should mean.

And this can lead to something much nicer than “data” that still has state but no race conditions on it. For example, inside the object it can decide to version the histories if it wants, it can do “atomic transactions”, it can only allow viewing when all of its relationships are consistent. And many more. The data idea was always a bad one, and this new semantic building block would allow objects to progress through time — and “learn” etc. — but would be much safer.

And then there is the idea of simulation itself, where the whole idea of “data” and “state” starts to get eclipsed by “competent objects” that can cooperate, much more like biological cells and human societies can.

So, the thought processes that led to this were basically “systems thought processes” that were about being able to easily define systems of processes: any kinds of systems of any kinds of processes.

This will not resemble much of what most readers will associate with OOP — in the sense of language like C++, Java, etc.

Quite a bit of the disconnect is in point of view/perspective. C++ and Java etc. use objects mainly to define new things that are very like data structures, and the programming that is done is generally very data structure like (e.g. “setters” turn any kind of entity back into a data structure that can be imperatively munged by anyone). Technically, this is actually “Abstract Data Structures” and though a subset of what can be done with objects, is a divergence from the intent.

The picture I delineated above allows a good systems designer to design a good system and make it, even if very large. It’s obviously that much of the benefit of the idea will be lost if skilled systems design is not employed.

An important side point is that the strength and generality of the definitional aspects of this paradigm can be quite independent of programming style (this is the central key of what “simulation of systems” means).

An object can perfectly simulate a function, procedure, a data structure, etc., so everything old can be covered completely, but it can also simulate very smart agencies and agents, to provide much stronger ways of doing things.

The flaw in how things have played out is that very few in computing actually put in the effort to grok the implications of “universal scalable systems of processes”, and instead have clung to very old and poorly scalable ways to program.

At Parc, we got a tremendous boost in what we could conceive and write in very few lines of code because we tried to use as much as possible about what we understood about this new way to organize and program complex systems. (Today, there is more understood about this, but even the first time around, we were constantly amazed at what was possible.)

How do you see Linus' prediction failure of parallel computing?

I’m not sure that he is predicting the failure of parallel computing. The context of the remarks seems tiny, and seems even limited to specific tiny contexts (for example using “locks” at all is a very bad way to try to deal with race conditions, etc.).

I just did a Google search that got back to me in about 1/4 second after considering millions of items of indexed information spread around the world (this seems like parallelism working to me!)

The Internet seems quite parallel …

But the simplest way to ponder this both philosophically and pragmatically is to note that biological neurons have a cycle time of about 5 milliseconds, yet we can do quite a bit of thinking and decision making in from about 1/3rd to 1/2 second. So: “a lot” can be computed by our brains in from 30 to 100 clicks.

Real-time brain scans of metabolism while thinking reveal that hefty percentages of the 86 billion neurons in our brains are doing something related to the thinking task.

Also, if you are familiar with molecular biology — and computer people should take the trouble to learn how all this works, each one of the 10 trillion or so cells in our body has billions of parallel pattern matches and actions. Some of these happen as rapidly as 1 microsecond. These converge to produce all the life cycle functions of each cell, including making more cells. The cells themselves started as a single fertilized ovum, and 45 cell divisions or so later, a baby was produced (it’s worth pondering the difference between this and the log to the base 2 of 10 trillion … what do you think those extra cell divisions were used for).

This sounds to me like “parallel computing” does work! and on scales that most computer people don’t do much thinking about.

Perhaps the poor job done in traditional HW and SW of architecture in general and parallism in particular is likely more reflective of the abilities and predilections of most of the computerists working in those areas.

One way to try and learn some really interesting things is to get a FPGA plug-in box for your computer and start to make highly parallel architectures with it (it’s a lot easier all around with (a) thinking parallel from the get go, and (b) realizing that many difficulties with parallel computing are actually due to the traditional von Neumann architectures which separate memories from processing. You can easily comingle these in a FPGA.

Which are the earliest real-time text editors?

It would be great to track down a precision answer to this.

There were very early text editors in the 50s using various flexowriters, teletypes, and modified IBM electric typewriters.

On Whirldwind I, the Laning and Zwerler “George” interactive programming system ca 1954 had limited text editing in its interactions.

I would look to the pathway between Whirlwind I at MIT and the SAGE project at Lincoln Labs, etc.

Also, the Bendix G-15 in the mid-50s (by Harry Huskey) had a TTY interface, and perhaps a text editing program.

There was “expensive typewriter” at MIT in 1961 on their first PDP-1. CTSS at MIT used “typewriter” terminals and had text editing.

I can’t put my finger on the earliest display text editor. Wes Clark’s LINC in 1962 had one (the LINC is a good candidate for the first real personal computer). Hard to believe somebody didn’t do one for the PDP-1 ca 1960.

Let’s see if we can track this down better …

What bottlenecks need to be addressed in order for the real computer revolution to happen?

I don’t think there is “one” real computer revolution, but the general premise that “it” or “these” haven’t happened yet does help pondering (and I also think that where things generally are today is so low compared to what could be done makes the phrase “the computer revolution hasn’t happened yet” quite reasonable).

Most of what I meant when I coined the phrase in the late 70s or 80s was to counter the really low hurdles that the 8-bit micro hobbyists had set for themselves. This was in danger of spreading with the 8-bit micros (yep, it did), and I started going around with some movies from Parc, and a talk: “Don’t settle for anything less!”. (This wound up having very little effect, even at Apple.)

As usual the main bottlenecks are in various “normal forms” of human thinking about most things — which is to accept what is in the world as the standard, and if any improvements seem called for, to make them incremental with the least effort possible.

The real deal is that the invention of computers is in the same class as the invention of writing, of mathematics, of science, even of the printing press (it’s a qualitatively new amalgam of all of these with additional magic they don’t have). These are not increments, but qualitative leaps into very different contexts that have created very different futures.

A normal pervasive human trait is “instrumental reasoning”: basically to hold onto existing goals and to try to use new ideas and new tools to further those goals. Only a small percentage will be able to see very different possibilities and goals that are implied by new ideas and new tools.

Just to check: I’m now going to the Wikipedia article about Seymour Papert’s Logo, the first programming language for children (and an excellent design). Logo (programming language) - Wikipedia)

And, as always, there are some words, a few links, but astoundingly in this article — really astoundingly at any time from the birth of the web and web browser in the early 90s to especially now — which is to be read on a computer that is running, there is no way to try out any of Logo’s features. For some reason, the Wikipedia people thought that something like an encyclopedia on an active computer just be just like a paper version (but with a few hyperlinks).

This is not just lack of imagination, but a crucial lack of any understanding of what a computer is, and what personal computing has to include.

And note that the web and the web browser were done years after Hypercard, which first showed a comprehensive way how to do all of this for end-users.

How did it get lost? (It didn’t, the web people didn’t bother looking around at all.)

And note that the web and the web browser were done 25 years after Engelbart’s big demo in 1968. (Again, what happened to make the field brain dead over time?)

This is just the tiniest example of the problem.

Another — staying with Wikipedia — is that it doesn’t have a half-way reasonable authoring system even for its limited goals.

And the authoring that does exist via the web browser is quite terrible for end-users, even after 25 years, and even after better authoring systems had been done before the web in the 70s and 80s. (Why isn’t Google Docs — for example — above a reasonable threshold for authoring even the limited media it deals with?)

All this is despite what’s happened with Moore’s Law since the mid-60s, and that the not well designed language Javascript runs many orders of magnitude faster than the fastest hardware done at Xerox Parc (and the amount of memory and storage is astronomical in comparison).

This means really wonderful authoring systems that can be used fluently by both end-users and by pros are quite possible. And “authoring” can include the safe wide-ranging authoring of processes not just simple media.

Because the larger group of end-users didn’t understand what the “computer revolution” should be about, it was the moral duty of people making computer artifacts to lift them. Instead, “normal” kept on being redefined lower and lower and was aimed at whatever people might buy rather to include important parts of what people actually need, as well as want.

Some of these needs require much more learning on the part of the end-users than marketing people want to deal with (very like reading and writing).

But we just can’t retreat back to cave paintings just because reading and writing is difficult — we have to learn how to teach the new things that lift.

Hypercard is a nice example to point to: according to Apple’s followups, more than 4 million separate people of all ages and kinds made their own scripted applications and applets. That was a revolution! But for a very short time (Apple itself failed to understand Hypercard, and dropped it).

Where is “the equivalent of Hypercard in 2020”?

Is anyone in the computing field even asking this question? Or trying to find it?

If all knowledge of programming and computer science was to be gone, what 10 books/papers could restart the field and bring things back or make things better?

This question could be about a cold start to computing — no computers around at all — or it could be about “the computers and networks we now have are around, but no one knows how to program them”. Or something somewhere in between.

This makes a difference because some of how programming got started — and some of what people still think programming is — came from the tiny size, slow speeds, and limited instruction sets in the late 40s onwards. Old ways of thinking tend to persist regardless of validity.

I have a bit of an urge to try to answer this, but I need more clarity in the question.

How many programmers were in the world in 1964?

It would be great to find definitive answers. I just sent an email to Fred Brooks to get his opinion.

However, it is safe to say — for 1964 — thousands at least, and most likely 10s of 1000s.

This is partly because of where IBM had gotten to by 1964: the somewhat backroom project of the 1401 to replace punched card accounting machines (PCAM), was very successful by then (I recall that they eventually made about 12,000 of these machines through the 60s).

Wikipedia says 2000 1401s were installed by 1961 and these were 1/4 of all the computers in the world at that time (so they estimate 8000 computers by 1961).

Just from the stated number and dates of the 1401 at this time (actually the year that I started programming on a 1401 for the US Air Force), I think we could say that it’s likely that there were more than 10,000 programmers in 1961.

This is because the 1401 was only inexpensive to lease relative to other computers — in today’s dollars a small system of mainframe, card reader, printer (without tape drives) would rent for about $22K per month. They would be run 24 hours a day within large PCAM shops and would likely have 5–25 programmers (there were about 20 or so in my Air Force shop for the 1401). I would guess close to 10,000 1401 programmers alone in 1961.

Wikipedia goes on to say that almost 10,000 were installed by 1964–5, and constituted 1/2 of all computers in the world (so they think: perhaps 20,000 computers in the world in 1964–5).

Just taking the 1401 numbers and dates at face value, indicates many more than 10,000 programmers in the world by 1964.

This early success of the 1401 plus other factors had gotten IBM to try “System/360” — launched in 1964 — to see if they could make a large hardware family in every price range, and try to unify it with one OS and software suite. This was a breathtaking move in any age.

They bit off more than they could chew, but wound up making a lot of money anyway (“No one ever got fired for buying IBM”).

Fred Brooks wound up with the responsibility of the OS (OS/360) and wrote the classic “The Mythical Man-Month” about the experience and lessons learned.

The combination of programmers needed for the quite a few thousand machines in the world around 1964, plus the hundreds to thousands who worked for IBM (I can’t remember whether it was 500 or 2000 who wound up for OS/360 alone), certainly would add up to a lot more than 10,000.

How could computers develop their own language and take over the Earth?

Frederic Brown was a terrific science fiction writer in the 40s and 50s. He also loved to write “half-pagers”: entire sci-fi stories in a few hundred words. Here’s a favorite one from 1950:

Dwan Ev ceremoniously soldered the final connection with gold. The eyes of a dozen television cameras watched him and the subether bore throughout the universe a dozen pictures of what he was doing.

He straightened and nodded to Dwar Reyn, then moved to a position beside the switch that would complete the contact when he threw it. The switch that would connect, all at once, all of the monster computing machines of all the populated planets in the universe -- ninety-six billion planets -- into the supercircuit that would connect them all into one supercalculator, one cybernetics machine that would combine all the knowledge of all the galaxies.

Dwar Reyn spoke briefly to the watching and listening trillions. Then after a moment's silence he said, "Now, Dwar Ev."

Dwar Ev threw the switch. There was a mighty hum, the surge of power from ninety-six billion planets. Lights flashed and quieted along the miles-long panel.

Dwar Ev stepped back and drew a deep breath. "The honor of asking the first question is yours, Dwar Reyn."

"Thank you," said Dwar Reyn. "It shall be a question which no single cybernetics machine has been able to answer."

He turned to face the machine. "Is there a God?"

The mighty voice answered without hesitation, without the clicking of a single relay.

"Yes, now there is a God."

Sudden fear flashed on the face of Dwar Ev. He leaped to grab the switch.

A bolt of lightning from the cloudless sky struck him down and fused the switch shut.

Another way to look at this question is to note that computers don’t have to take over the Earth, because humans are giving over the Earth to them.

In other words, worry more about human stupidity than artificial intelligence.

Still, it’s worth looking at the question more literally.

Staying on the stupid side of things for a minute, note that what ML actually does is to “classify/correlate/etc” many inputs into usually fewer outputs. If we tie actions to these we will get a goal seeking mechanism. (Take a look at my recent Quora answer about Explaining AI To A Complete Beginner. The example uses an extremely simple correlating mechanism to learn by repetition.)

We could imagine using a larger version of this to try to buy or sell stocks, or even to start a pre-emptive nuclear war. The problem is that correlation/classification is quite a bit more noisy than most practitioners want to discuss. For example, here is one that has a .993 correlation over 10 years — many in the Pentagon and in the current government would very well see this as definitive and would want to push the button. But:

image

Yikes!

More of these “foolers” at 15 Insane Things That Correlate With Each Other (thanks to Mark Killman for the link).

To finish, I will put off looking at computers, and instead direct your attention to something mechanical that has already taken over the Earth by developing languages, namely “life”. The three key processes are variation, speciation, and epigenesis.

The first is due to the extreme noisiness of the chemistry of larger molecules at room temperatures and a bit higher. The second is due to the possibility of making a thing that is less noisy than the parts it is made from. The third is that the environment these processes match to (or don’t) also contains the processes and what they make.

So we’ve got both “unstable” and “somewhat stable” working in environments that can include the processes and their residues.

It is well worth pondering how these three principles work at every scale from the atomic to the planetary.

Evolving a brain that can create cultures and languages within the cultures is just the tiniest part of what has happened.

How do I reason that the Single Responsibility Principle (SRP) does not violate the spirit of the Open Closed Principle (OCP)?

In theory, I don’t like this question, but in practice, I do.

But first a quote from Poincare: “Physicists talk about Physics, Sociologists talk about methods”.

He is trying to point out the real and large differences between a deeply developed field, and one that is still trying to find out how to be scientific at all.

When I first started thinking deeply about systems made from intercommunicating logical computers, I thought much more about the messages than the computers.

The latter were there because they could do anything and everything that a computer could do, and so provided completeness all the way down in the semantics.

But in this way of looking at things, a message send is something looking for a service for something, and there is an element of “search” and “retrieval” going on.

So — as I’ve said many times previously — I made a real mistake to not think about what this style of programming should be called — the flippant response to the question put to me in 1966 was “object-oriented programming”, and this has been a clear distraction to people actually pondering what is going on, and what should be going on.

The peculiar way that “Smalltalk” happened — which didn’t turn out as I had planned “Smalltalk-71″ should — also made things confusing, because it was good enough to make a big difference at Parc, and we didn’t go back to “the way it should have been”.

One of the good things that we did carry through was to have messaging always be just requests: every “object” could reject any message. (This idea was also part and parcel of the way the ARPA/Parc community did networking, including the Ethernet and Internet.)

And because of this, any message could wind up being a very generic and distributed broadcast request.

From this POV, we can see that it is likely that many too many assumptions are being made in the “principles” above to allow them be good over time and in the face of scaling.

But we can do something with “servers” and “messaging” that will scale well in a variety of ways.

Let us suppose that we are trying to write some code in a process to accomplish something — and that what we have to work with are only servers and messaging.

We should be able to see right away that what needs to happen outside of our code is that search processes need to find the appropriate servers for our needs.

In the world of the large, there could be more than one service that could be used, and we need to have meta-info that will help the searchers to find the best ones (and they will change over time).

We can see that the important dependencies here are semantic and beyond just matching the kinds of replies we are hoping for.

But it’s also the case that a very large percentage of the gotchas down the road are from too many dependencies that are not understood (and some not even known).

So it’s worth quite a bit a work in any system intended to be real and over years, to do the extras needed to guarantee semantic integrity. This will make an enormous difference in the viability along many dimensions of the system.

The most important principle in using “principles” is to check whether the “principles” are actually at a level that will help overall.

Why is VSCode nearly 300 megabytes while Sublime Text is just 16 megabytes?

“Ouch!” on both the 300MB and the “just 16” … yikes!

What are some good examples of liberal arts points of view brought to computing?

“Liberal arts points of view”?

Although I went to an “engineering” high school (“Brooklyn Technical HS”), I should have gone to “Bronx HS of Science” or “The High School of Music and Art”: the matchups are partly a matter of temperament. However, one of the most useful things that happened to me was what I encountered at BTHS.

After the Air Force in the early 60s, I wound up at the University of Colorado in “The College of Arts and Sciences” with two majors (Pure Math and Biology), and two minors (English and Anthropology), but spent most of my time in the theatre, especially composing music for shows, playing in pit orchestras, and doing various kinds of theatrical design. I also took courses in the arts school.

I had learned to program computers in the Air Force and used this as a half time job at the National Center for Atmospheric Research in Boulder to pay for my tuition, room, and board whilst there.

Too much of all of the above led to overload by the time I graduated in 1966, and I couldn’t bear the thought of either a real job, or going to grad school in math or bio.

This led to the idea to take a year off and learn something about computing (like most programmers, I knew essentially nothing). This led to finding a master’s degree program at the University of Utah (mainly because it was also in the mountains).

This turned out to be lucky, because Utah had just started up an ARPA project — mainly to invent continuous tone 3D graphics (it did) — but also to generally work on the larger areas of interactive computing, especially with graphical displays.

This was a revelation, especially encountering Ivan Sutherland’s Sketchpad before even getting a desk, and a few days later: Simula I.

But much larger was the “ARPA Dream”: “The destiny of computing is to become interactive intellectual amplifiers for everyone, pervasively networked worldwide.”

In the terms of the present question: the “ARPA Dream” was a “liberal arts” vision from many different perspectives.

Another context in 1966 was “computer science” as an aspiration: The first Turing winner Al Perlis explained: “Computer science is the science of processes: all processes”.

The term “Software Engineering” was not coined yet, but around was the realization that this new set of soft-technologies had both engineering components — beyond Electrical Engineering” — like designing and building bridges — and also science components — like the scientific study of bridges and other constructed things. (See Herb Simon’s “The Sciences of the Artificial”, etc.)

Human beings were the main reason for the ARPA Dream. What are they/we? What are we trying to do? How does our cooperative sociability play off against equally built-in impulses for competition? How do we communicate? How do we elevate our built-in kinds of thinking with invented kinds of thinking, and how do we use and invent representations to bootstrap processes and communities — e.g. science — that are smarter than we are? And how do these representations in turn affect the ways we communicate and the new ideas we might have?

A key property of the ARPA community (and the subsequent addition of Parc, which grew out of it) was that it was more cooperative than competitive, and very much followed both the general way science operates, and most especially how scientists and engineers learned to deeply cooperate during WWII and the Cold War.

Because of this — and especially because the tone set by the ARPA research founder Licklider (a research psychologist) — the community was very open to ideas from everywhere, not just the hard sciences and engineering.

Most especially, it was devoted to “problem finding” not just trying to solve existing stated problems (it realized that when you want real progress, you are asking for departures from existing contexts, and this means that problems defined in the existing contexts may not be good, and could mislead).

To pick just a few of the ARPA venues that were deeply into these larger issues: MIT AI, CMU (AI, CS), SRI (Engelbart), then Parc adding in the 70s, etc.

I wound up never leaving this community.

In order not to run on even longer, I’ll just note that pretty much everything I had picked up in my larger interests, wound up getting used in the projects I worked on and initiated.

Biology: Being able to see the relationships between real scaling and what needs to happen with large networks of processes intercommunicating. Distributed systems. Seeing what Sketchpad and Simula were instances of. Morphogenesis as the future of large systems. Etc.

Math: Being able to see “algebras” that the existence of computational modules/processes could give rise to. Seeing that “intensional logic” was a more powerful way to think about larger computer structures than “extensional logic”. Being able to see what McCarthy was really doing with Lisp. The computer as a metamedium for “all meta”. Being able to think mathematically when inventing a programming language. Etc.

English: What it means to read, write, create literature. How learning these affect deep thinking. What it means to understand from representations. Why we need to start with children and inventing computing and “literature” that is good for them. How the ARPA dream resembled invention of writing and then the invention of the printing press. “McLuhan”. Etc.

Theatre: Why theatre works with humans. What it tells us about communication between entities of all kinds. How this can be used to invent and design a universal GUI. Etc.

Anthropology/Linguistics: Context and culture, and how this needs to affect learning and especially the education of children. Cognitive psychology as a subset of cultural anthropology. How humans actually learn and use languages. etc. Design to elevate whole cultures. Etc.

History, and of Ideas, etc.: I don’t know where to stick this, but I think the largest influence on me, and hence on things I wound up doing, was a very wide range of reading. Besides the simple of idea of furnishing many more possibilities for seeing analogies, I think the big deal here is both the size of the “idea-world” that is revealed, and also the range of human thought: most specially including the mostly weak varieties of though over the ages, with occasional flashes of the most amazing ideas and accomplishments. “Problem finding” is a philosophical idea.

Music and Art: how the “configurative” arts affect us, and what it means to communicate in figurative ways. Esthetics as a deep modality. Etc.

Engineering: this became both a hobby — to be able to do engineering thinking after and sometimes during inventions — and most especially to work with the many genius engineers I was privileged to be around back then. Etc.

So, from where I am in the peanut gallery, I’d say the “liberal arts” had a lot to do with many fundamental inventions that I was — with others — directly involved with, including: problem finding, personal/tablet computing, the bit-map screen, the Parc GUI, dynamic message passing objects, distributed systems of many kinds, etc.

A provocative question for readers: can you see the lack of “liberal arts thinking” in where many of these early inventions have wound up today?

What is the relationship between performance and abstraction?

This question needs more to it. For example, does the “abstraction” contain requirements for certain levels of performance? (as we’d expect if the abstraction is a “runnable requirements” language). If the result meets the requirements, then the relationship is 1:1.

As another example, it was sometimes the case in the 70s that the excellent interpreted APL at the Stanford Linear Accelerator Center (SLAC) would outdo lower level FORTRAN programs. In part, this was because the higher dimensional internal operations of APL had been highly optimized, and also this APL “knew” more of what it was trying to accomplish (see Phil Abrams’ thesis).

In any case, there is certainly no simple relationship between the two.

How did drawing work on the Alto?

Parc started up in 1970–1, coincidentally with the advent of the first VLSI chips: the 1103 dynamic RAM from Intel (all of 1024 bits, but in a much: smaller package, lower voltage, and lower cost than core memory).

For the first time, just a few people could make an entire mainframe “pretty easily”, and we used the 1103 in 1971 to make an emulated PDP-10 (this was perhaps the first whole computer to use an entirely integrated circuit main memory).

We did many experiments for displays, all of them driven by digital bit-maps of the characters to be displayed. This allowed some outlaw thoughts, such as: I wonder if we could afford to use the 1103s for a general bit-map display buffer memory for a personal computer?

There were lots of reasons why we soon determined to afford this. One was that none of the previous computer displays were general with respect to the images that could be displayed: but a bit-map display could be general if there were enough pixels. Gary Starkweather was starting to print bit-maps on his new laser printer — these were general. Ben Laws and I did quite a few early bit-map character sets and started to understand what resolutions were needed, and Dick Shoup and I started looked at display of images using both half-toning and continuous toning (as the U of Utah was using for its 3D rendering).

image

Making an early “printing quality” bit-map character in the font Lydian Cursive (1972). The character in the size it will appear on the display is shown at the top left. This also is a good capture of the phosphor color we used at Parc.

image

A page of text rendered using the Lydian Cursive font on the experimental display system (1972)

Cutting to the chase …

Because we liked — whenever possible — to do an extensive experiment before trying for a combination of invention and engineering, we used our experimental display generator in 1972 to try out a number of avenues that a bit-map display could support. These included character display in printing-like fonts (above), half-tone images and painting and generating them, and “2 1/2 D” Disney-like real-time animations in multiple levels of depth. We did the same for real-time multiple timbre generation for music playing.

The first digital drawing and painting at Parc was on a system I designed — which was fabulously well built, and with a number of additional designs by Steve Purcell, a truly gifted Stanford student who was an intern in our research group.

I did most of the early drawings and paintings. We first used an actual Engelbart/English mouse from SRI for this (which was a bit tough), but the Alto project also produced a much smaller and higher res digital mouse that was much better behaved, especially for drawing.

Quora will not allow me to put up “actual bits” here, so I advise that you copy these images and look at them “actual bits” to see what is actually there — otherwise the single-bit images will produce moire patterns with the displays you are using.

Drawing Pooh

image

Sequence of painting the Pegasus

image

The menu of halftones and brushes can be seen at the top of the screen.

These strengthened the case for going all out on the display for the personal computer for Parc that a number of us wanted to do (later named the “Alto”), and this project was started (under the radar) in Nov 1972.

We had adopted Butler Lampson’s urgings to have every Parc invention engineered for 100 users, and this meant we had to be able to build at least 100 Altos. The original desired cost for the Alto was around $12.5K in 1972 dollars, but it wound up being more like $16K (or a bit more). This was before microchip CPUs, and they would have been too slow for many years in any case. So the CPU was made using MSI components (bit-slices etc) on a few boards.

The memory budget was 128KB (in 16 bit words), and the tooth gritting part was to decide to allocate fully one half of this to “just the display buffer” of about 500,000 pixels (808 x 606). In practice, the Alto could allocate regions of memory for display purposes, and had a “fat-bit” mode, so there were many possible tradeoffs to put up a “full-paged display”. Pragmatically, this allowed working memory in the Alto to range from about 64KB to around 80KB, and a bit more for very restricted purposes.

Most important was that the Alto was microcoded with dynamically reloadable microcode, so almost every part of what is normally hardware was emulated. This allowed many experiments with the display system (entirely in microcode), including both Steve Purcell’s “CHAOS” 2.5D animation code, and for Dan Ingalls’ now everywhere in the world: BitBlt.

image

The small high res optical shaft encoder early Alto mouse

image

The Alto with the Cookie Monster, now animated with a Pegasus, and about 80 “ping-pong” balls at about 10 fps. Animation system: “CHAOS” by Steve Purcell.

https://youtu.be/Watp6srs9Jg

And here is what I wish Quora — and all places for text and pictures on the net — would do: allow a drawing to be done in place in a galley of text and images (why not for goodness sake!). This is from around 1974–5 on the Alto in Smalltalk. The idea in the UI is that it will sprout in a frame around the object when you go into it (UI and this painting system by Ted Kaehler). The image/painting object had most of the features of MacPaint 10 years later.

image

Note the multiple fonts in the text, the live font editor which is being used to render the text into the Elvish script Tolkein devised, and the drawing in place in the image of Frodo and Gandalf.

Below is a little movie of the place within Parc that we built for learning with children ca 1975 or so. You will see that some of the Altos still just protect the users from the display using cardboard containers.

The music — “The Happy Hacker” composed by Chris Jeffers of our group — is played in real-time on the Alto using Steve Saunder’s FM synthesizer and Ted Kaehler’s Twang music system. The sound this produced was really good. It has been compromised a bit from several generations of recordings, 16mm film tracks, sprocket jitter, etc. But I think you can still hear the polytimbral voices pretty clearly on headphones.

https://vimeo.com/111334072

A few years later, the Notetaker Smalltalk kernel was done for both the Notetaker and the Dorado (called “Smalltalk-78”). Here is a screen in actual pixels (again: please copy and display elsewhere to get rid of the moire on the background) from a revival of this system (from a file on a Xerox disk pack) using the Dorado screen format.

image

You can see this system being demonstrated, including the painting, in the tribute video I did for Ted Nelson a few years ago:

https://www.youtube.com/watch?v=AnrlSqtpOkw&t=135s

This system is partly of interest because it was the one that was used in the infamous “Parc Steve Jobs” demo.

To what extent did Simula influence Smalltalk?

To a large extent. Basically: Simula cannot be praised too highly — and along many dimensions.

I was asked to write a comprehensive history of Smalltalk for the ACM’s “History of Programming Languages” — called “The Early History Of Smalltalk”. This can be found online. One version of it was posted by Bret Victor.

The original history was published by SIGPLAN in 1993.

Here you will find all the influences I was aware of that helped me in 1966 come up with the notion of “all semantics/computations/everything being in terms of messages between logically complete computers”. For this, Simula I was a catalyst after — especially — seeing Ivan Sutherland’s Sketchpad a few days earlier.

You will also find how Lisp came to play an important role to replace the Simula-Algol scheme with a late bound reflective one, and how Dave Fisher’s CMU thesis (1970) was key.

Finally, the 4 different Smalltalk designs at Parc and how they happened are examined in some detail.

What was the historical significance of FORTRAN?

The historical significance of FORTRAN is primarily due to the level of software engineering in the system, but there was also a “four minute mile” effect from the combination of “early” and “efficient”.

An earlier significant system along the lines of “runable mathematics” was by Laning and Zierler — an interactive system called “George” ca 1952— on the Whirlwind I computer at MIT. The syntactic and semantic facilities provided by George exceeded those of FORTRAN, and it also was one of the very first interactive programming systems. (Apparently FORTRAN was already started at IBM by 1952.)

Especially in the 50s and into the 60s, with tiny and slow hardware, hand optimization of code in assembler was one of the main jobs for programmers. The IBM group headed by John Backus determined to make compiled code compete with optimized assembly code in both speed and space. This led to long, many pass compile times — thought of as a luxury back then (programmers were cheap), but the results were quite good enough in many cases.

The extent to which IBM carried this out was astonishing. For example, you could actually write and run FORTRAN on the 8KB 1401 at USAF Air Training Command (where I worked in the early 60s). I think (I hope someone will track this down) that the FORTRAN compiler would do over 100 passes — from tapes to tapes — to do a compile. This was somewhat frowned on by the brass at the time.

FORTRAN’s (relatively) efficiently running code produced two distinct kinds of user groups: (a) scientific programmers and (b) job shop programmers who wanted to get certain programs running well enough and quickly. The FORmula TRANslation part was limited to just arithmetic expressions: everything else in the early FORTRANs was quite basic and at the level of control structures and simple macros in assembler.

What are your favorite analogies to explain programming?

I have sometimes used “creating a banquet” or “organizing a restaurant”.

The “following a recipe” analogy is included as a tiny part of the larger processes, but it doesn’t capture many of the important parts of programming, which involves design, organizing many processes — including intelligent entities — to cooperate at many different levels, the systems designs, the many heuristics that deal with “variance”, the learning involved, … . And so forth.

If the above is to work as an analogy, I should not have to give more of an explanation.

Is it possible to learn two programming frameworks at the same time?

In that galaxy long ago and far away (the 50s and early 60s) there were no standard operating systems, and most programming was done in assemblers that were symbolic representations of the underlying machine codes. And many “shops” had several computers of different types, and there was migration …

So most programmers back then had to learn multiple machines, assemblers, macro systems etc. This led to a kind of “abstract machine” that one wound up thinking in for programming. Some of the better programmers wrote portable macro systems that they would move from machine to machine, that would translate into whatever assembler existed (e.g. Bruce Baumgart had a really nice one).

The machines were quite ideosyncratic back then, but the tendency was to homogenize them at the next level of abstraction.

Soon after FORTRAN in the late 50s, there was an explosion of “high level” languages starting in the early 60s, and many of them were pretty specific about their application areas.

Most programmers would learn several of these as well. This led to a very different realization: that it was really good to learn different kinds of programming languages because it expanded one’s thinking and approach. For example, a popular trio was Algol, APL, and LISP.

However, there was a lot to be said for homogenization as well. This led to the scheme called UNCOL (pronounced “uncle”) by Ted Steele (UNiversal Computer Oriented Language). The idea was to have this more as a universal target for HLL compilers than to be directly programmed in.

C’s best role was to act as this target. But currently this role is filled by LLVM pretty well (especially as compared to the more modest goals of UNCOL).

It should be pretty clear that learning only one programming scheme is terribly limiting, and this is already within a very (so far) limited range of options.

Why do musical artists - even the best ones - lose their ability to write a memorable new melody, far before they lose the ability to sing or play?

For the same reason that it is much easier to read one’s own language than to write in it — especially to write beautifully.

Are there pianos with more than 88 keys?

Take a look at the Boesendorfer Imperial Grand. It has 9 extra keys in the bass that extend the compass down to a low C (so 97 keys).

image

Even more important is that Boesendorfers are wonderfully sounding instruments, with a very creamy french horns to flutes in the tenor through soprano range.

How many of ARPA/PARC’s researchers were married while they did their breakthrough work?

With regard to the computer researchers there, I’d say “quite a few” (I don’t know how to be more precise). And I would guess that the physical sciences researchers were even more so.

How would one explain AI to a complete beginner?

Many single-celled microorganisms are able to detect toxic conditions — temperature, acidity, etc. — and initiate a “tumbling” behavior that finds random directions that can move them away from the danger. Similarly, many can detect when “things are good” and even guess at “better” and move in that direction.

A thermostat is a machine that can detect “good” and “bad” and initiate reverse actions to counter “bad”.

W. Grey Walter created a “robot tortoise” in the late 40s that used this idea to avoid being trapped by obstacles, find its “nest” for recharging its batteries, etc.

image

A time lapse photo of a Grey Walter turtle finding its nest.

The next step up from mechanisms that can stabilize a goal, was also illustrated beautifully by Walter, in having one of his tortoises be able to do “conditioned learning” (like Pavlov’s dog), to associate the sound of a whistle with a “bump” and to initiate the “bump avoidance” behavior when a whistle was blown. *

Walter called this “The 7 Steps Between Chance And Meaning”, and used this to make an amazing design and mechanism.

image

An “Etoy” to implement Grey Walter’s 7 Steps.

Steps 1, 2, 3 notice a coincidence that could be significant, which is remembered briefly (4). If there are enough of these in a short enough period (5) this is remembered as significant in a longer-term memory. After this a stimulus (6) will be looked up in the long term memory (7), and if this has a memory, will initiate the behavior without needing the original stimulus.

** I wrote an “active essay” for children using these ideas “Marvin Minsky And The Ultimate Tinkertoy”

Marvin Minsky And The Ultimate TinkerToy***

image

What’s fun is how the rest of what we call “intelligence” can be made from these two examples (notice e.g. that the conditioned “stimulus” has become a kind of “symbol”.

“ML” comes directly from the second example — it is a kind of “curve fitting” or “function finding” or “correlation”.

But the most important part of “real artificial intelligence” comes from being able to do things with “symbols”. Much of this has gotten lost in the last few years because it is much more difficult than the curve fitting parts. But it should be readily apparent that trying to make autonomous mechanisms that don’t know what they are doing — the tragic Boeing 737 Max should be a sobering reminder — is a very bad path to take. It’s imperative that symbolic mechanisms that can deal with contexts — not just “fits” — be developed.

Daniel Kahneman**** calls the curve-fitting fast reaction mechanism in human brains “System 1”, and the much slower but much more intelligent cognitive mechanisms “System 2”. Both of these are needed for any kind of reasonable “Artificial Intelligence” that could affect human lives.

A systems organization for “artificial psychology” was suggested by Marvin Minsky in his “The Society of Mind” and “The Emotion Machine” books.***** One of the many interesting problems Marvin tackled was how different parts of a brain/mind could communicate with each other by only using mechanisms that were biologically plausible.

“Real artificial intelligence” doesn’t need to do things biologically, but it is very likely that any real solutions will have been “biologically inspired”. For example, consider the problems of relating the kinds of processes that ML can build with the kinds of relationships employed in human use of language. This is not a trivial problem. It’s clear that humans don’t do this very well either, and this is probably a main clue.

---

* Grey Walter: A machine that learns, Scientific American (1951) 185(2):60—63

*** Yoshiki Ohshima helped greatly in getting the “active” part embedded and working, using John Maloney’s “GP” interactive language for children.

**** Daniel Kahneman “Thinking: Fast and Slow”

***** Marvin Minsky: “The Society of Mind”, and “The Emotion Machine”

What did you think of the work being done at General Magic?

I was “in the conversation” (along with John Sculley, etc.) when this project and then company was set up. Basically, the idea was to make a hand-held device that would embody something like the next levels of HyperCard as a cradle for many kinds of services gotten via networking.

Marc Porat was the driver of this project, and wrote to John Sculley in 1990: "A tiny computer, a phone, a very personal object . . . It must be beautiful. It must offer the kind of personal satisfaction that a fine piece of jewelry brings. It will have a perceived value even when it's not being used... Once you use it you won't be able to live without it.”

Sound like an iPhone? Actually the deep concept was even more so. Quite a few of the original Mac team — Bill Atkinson, Andy Herzfeld, Susan Kare, etc. — wound up working on this.

Outside of Sculley “Apple Management” was not interested in this (in the same way they were not interested in HyperCard, and any number of other good ideas around this time). This general lack of interest eventually brought the entire company to its knees just a few years later.

Many good things were done by the new company wrt both hardware and software (worth checking out via Google for those interested). I especially liked the UI, the extension from HyperCard, and how the end-user programming was thought of.

I believe this product could have been a success with the “old Apple” behind it. It was really pushing the state of the art for its day and thus required even more of the top levels of people that it had plus a sustained marketing push that only a very large company could make happen.

One could argue that it was too far ahead of its time in that it required too few actors to do so many of the myriad things that gradually happened as the Internet took hold. However, what we wound up with (via the WWW and its browser) was quite terrible by comparison, and this has been greatly limited by the limited concept of the web and its browser).

Why was Alan Kay's Smalltalk in the Xerox Alto sufficiently efficient for a desktop GUI in 1979 but it isn't sufficient today?

The two answers by Eliot Miranda and Mark Miller are very good, so there is little to add here. But I should say a few words about the general Parc approach to HW architectures (and more can be found in other answers I’ve given).

First, though, is to compare a one-bit-per-pixel architecture on a 16 bit word machine to a 24-bit-plus-alpha architecture on a 32 or 64 bit word machine. There’s a factor of 8 to 16 difference per memory cycle (3 to 4 doublings of Moore’s Law) just for this.

The problem with the 16-bit micros in the early 80s — such as the Motorola 68000 (used on the Mac) and the Intel 8086 (used for IBM PC compatibles) was not just cycle time but also the fundamental difference between being able to stick “hardware like functions” (such as a byte code interpreter, graphics painting, music, etc.) into a microcode engine that could run about 5–6 times faster than main memory vs. having to do this with a CPU that is using main-memory cycles for fetching its own code and data while also trying to do low-level functions.

The Alto was fabulously efficient because its approach to how its real CPU (the microcode engine) interacted with the available memory bandwidth (including the use of multiple program counters for zero overhead low-level task switching and avoiding interrupts).

Whereas the commercial CPU architectures had very poor notions of how to use memory and how to emulate low-level functionality. This made them extremely inefficient for any given clock cycle they could muster.

We put in quite a bit of energy during the last part of the 70s trying to explain software needs and how this should affect hardware design to both Intel and Motorola — and giving them the same kinds of demos that Steve and the Apple people saw — but to no avail. Neither company ever got interested and savvy in software.

In the 80s, this was even more difficult because the IBM PC architecture — for an deeply naive market — was quite successful (this is like the WWW/browser, which is a terrible architecture, but the market can’t see it).

When at Apple, old Parc hands urged the development of custom CPU chips and other custom hardware, but this was rejected for almost two decades (Steve mentioned this in the introduction of the iPhone).

A simple bottom line is to notice how difficult it was and how long it took for the industry to wake up to the need for GPUs, despite that screen painting was a major need for every personal computer and workstation. This kind of blindness is almost a defining characteristic of “computing” but I’ve never been able to understand quite why — especially given how successful Parc was because it was willing to do any and all hardware that advanced software required.

Is human born in sin?

Here’s an attempt at a non-religious answer.

We humans have many paradoxes that we deal with quite unevenly. For example, genetically we have urges to cooperate — we would not be social animals if we didn’t — and also to compete. We desire to be in a group, and — once there — we start competing — and often cheating — once we have a bit of cooperation assured. The conflict between these two genetic drives has caused a lot of trouble and misery.

This and other such paradoxes can be recognized as such, and have led to various theories of “good” and “bad” behavior. Some of these are very compete-centric, such as social-darwinism, and some are very cooperative-centric.

Another part of our genetics has to do with stories, how we use them as explanations, and how we invent what stories need to be even better stories. This and a few other tendencies we have readily leads to the notion of “sin” — a more emotionally charged version of “bad” — it doesn’t require a deity, but very often that is part of some of the larger stories cultures comfort themselves with.

A related example is the idea of “soul”, which is often reified into a thing, and also very often connected with beliefs in deities. But notice how useful this is if it is kept as an idea and concept: it is the thing about yourself that you most don’t want to lose. Similarly, “sins” are those things that go against your best conceptions of yourself and the societies you live in.

Why do Fender guitars have different headstocks than other guitars?

Just to introduce the concept that there were things happening before the near present … i.e. in the past.

One of the first guitars to use an inline headstock was the Fabricatore ca 1806.

image

The idea was used in many guitars by the top German builder of early Romantic guitars: Stauffer. Here’s an example from 1830.

image

Which programming language is best for developing a website that can scale to over 100 million users?

By the end of 2020 it is estimated that there will be about 50 billion computers connected via the Internet. How to invent such an architecture—and one that never has to be shut down for fixing and growing— was worked out by the ARPA-Parc research community over some years centering on about 45 years ago.

This is not primarily a programming language problem, but as with most things that scale, it is a design problem.

The question lacks too much information to go much further (for example, what kinds of services, what kinds of interactions between users, etc.).

But it wouldn’t hurt at all to start with the idea of a distributed system here, and to think about how to do sharing with processes that are distributed (and possibly redundantly sited as well).

A really fun (and “funny”) way to do it — and good for folks to ponder — is to note that everyone who will be using these services will be doing so through a very powerful and high capacity computer (laptop, phone, etc.). So there is already a distributed processing system of 100 million potential processors and storage. The Croquet systems we’ve done over the years show that a considerable amount of the load can be distributed and shared via the users themselves …

Once you work out a good such architecture, it is likely that really convenient and flexible dynamic languages will have plenty of computing power for most of the needed tasks.

“As scaling increases, architecture dominates materials” (some old fart said that many decades ago, but guess what?)

Etc.

What were the biggest mistakes made by Xerox?

It’s worth noting that IBM had no idea about personal computing (they would never have farmed out the OS if they had taken it seriously), and didn’t know how to sustain their early success.

Nor did DEC.

Apple got the GUI, but not networking. And, Apple didn’t understand what Microsoft did (Apple was for quite a while the second-largest software company in the US, but they were quite timid about this).

Microsoft did not understand the Internet.

The people who did the web and the web browser didn’t understand “personal computing.”

Etc.

Xerox was earlier — and a copier company — and didn’t understand most things.

All of the above is worth pondering because the ARPA-Parc research community did have a very good picture of the whole gig, and this was what allowed it — and especially at Parc — to invent most parts of the future as a well fleshed out ecology covering the hardware, software, GUI, OS, etc.

In the early 60s, I made my living and worked my way through college as a programmer on mainframes in the Air Force and the National Center for Atmospheric Research. I had no knowledge of what ARPA was doing, and more importantly for this answer: no overlap of their sense of “desirable destiny” when they looked at computing.

The only difference was that in 1966, when I accidentally wound up at an ARPA-supported grad school and saw the “ARPA Dream” and what they were trying to do, I went crazy with the sweep, scope, and destiny of it all, and joined the other “crazies” already in the ARPA community as one of a few hundred.

What made it impossible to even think about anything else is that the “ARPA Dream” was sustained by deep engineering and science — there was no question that the “dream” was actually doable: we just had to figure out how.

None of the other organizations above really could see the “ARPA Dream,” nor that it required the invention of a whole new system of hardware, software, networking on every scale, advances in human factors in many directions, etc.

Each organization picked out something they could make money with and that appealed to their local goal structure (including Xerox, which only latched onto the laser printer). But none of them were willing to abandon their old contexts and move wholeheartedly into a quite new total context.

This “goal conservatism” is also called “instrumental reasoning” (any new idea or tool is judged by how well it contributes to current goals). Most of humanity has this trait (and it’s easy to see why).

My view, looking back, is that there were a few people in the right place at the right time who could see new goals when they looked at computing, and the funding of Licklider by ARPA was a huge lucky step. This funding, and the activities it gave rise to, attracted people of like mind, and — as with me — those who could “see when shown.”

So: I look at most corporate shortsightedness as “unfortunately normal” and at ARPA as “unusual” and “lucky.”

What happened to Dan Ingall's Smalltalk interpreters?

I think quite a few of them are around and working in different forms. Dan has given talks showing live examples from almost all of the Smalltalks for which he was the main kernel force. Many of these talks are on YouTube.

Online, there are playable versions of Smalltalk-76, and a usable version of Smalltalk-78 that were brought back to life primarily by Dan and Bert Freudenberg.

There are also playable versions of Smalltalk-72 in several forms, including on the excellent Alto emulators done by Josh Dersch at the Living Computer History Museum in Seattle.

What parts of Englebart's Augment/ARC work are still missing from modern groupware, and why are they important?

A direct answer here would be much larger than “Quora sized”. This is because Doug Engelbart’s conceptions were qualitatively different than most of the much tinier ideas and visions of today’s developers.

The wonderful Bret Victor wrote the best obituary for Doug shortly after he died: A few words on Doug Engelbart

It’s worth reading the entire piece. Two good extracts are:

“When I read tech writers' interviews with Engelbart, I imagine these writers interviewing George Orwell, asking in-depth probing questions about his typewriter”

and

“Almost any time you interpret the past as "the present, but cruder", you end up missing the point. But in the case of Engelbart, you miss the point in spectacular fashion.

Our hypertext is not the same as Engelbart's hypertext, because it does not serve the same purpose. Our video conferencing is not the same as Engelbart's video conferencing, because it does not serve the same purpose. They may look similar superficially, but they have different meanings.”

The best way to get answers to the Quora question is to read Doug’s original proposal from 1962, i.e. 57 years ago, submitted while Ivan Sutherland was inventing interactive computer graphics (and Ivan is mentioned in the proposal).

image

http://www.dougengelbart.org/pubs/papers/scanned/Doug_Engelbart-AugmentingHumanIntellect.pdf

What's arguably the single most amazing thing that computers have made possible?

I think the answer to this is very like the answer to “what is the single most amazing thing that writing (and then the printing press) have made possible”.

It is not that writing and the press made possible a very different kind of travel through time and space — this is wonderful and important — but that a new manner of traveling through ideas was a consequence of what it meant to learn to read and write fluently. Many studies have shown that a literate culture thinks in ways that are qualitatively different than traditional oral cultures, and that the correlation between writing and civilizations is not a coincidence.

Further qualitative changes came from the advent of printing, and both of these changes are a bit puzzling since each was originally a kind of automation of what came before: writing of speech , and printing of writing. In both cases it was the “what elses?” that made the difference. The “what elses?” have to do with the “what’s differents” that happen when one gets fluent with any kind of tool, especially those that carry ideas as well as actions.

There’s a lot more to all this than fits in a Quora-sized answer, but for a start, consider what writing and then printing mean for both description and argumentation: new ways of writing and reading in form, extent, organization and kind of content are now available, and these co-evolve with new kinds of ideas.

In this light we can ask about what computers bring that is qualitatively new and important. Consider what it means to not only describe an idea, but to also be able to simulate it, to carry it out, to explore its implications and hidden assumptions in ways never possible before. JCR Licklider — who set up the original ARPA research that has led to today’s personal computing and pervasive networking technologies, wrote in 1960: (paraphrasing a bit) “In not too many years the relationship between humans and computers will think as no humans have thought before”.

This vision was first analogized to complementary tools and vehicles, but was soon seen as the much larger idea of a change in types of communications and kinds of thought that would be as revolutionary as those brought by writing and the press.

To understand what has happened, we only need to look at the history of writing and printing to note two very different consequences (a) the first, a vast change over the last 450 years in how the physical and social worlds are dealt with via the inventions of modern science and governance, and (b) that most people who read at all still mostly read fiction, self-help and religion books, and cookbooks, etc.* (all topics that would be familiar to any cave-person).

One way to look at this is that when a new powerful medium of expression comes along that was not enough in our genes to be part of traditional cultures, it is something we need to learn how to get fluent with and use. Without the special learning, the new media will be mostly used to automate the old forms of thought. This will also have effects, especially if the new media is more efficient at what the old did: this can result in gluts, that act like legal drugs (as indeed are the industrial revolution’s ability to create sugar and fat, it can also overproduce stories, news, status, and new ways for oral discourse.

On the other hand, pretty much all science and engineering is only possible with computers, and mostly because of the computer’s ability to actively simulate ideas (including the “idea of thought” itself) coupled with the great extensions to what the printing press has already wrought.

Einstein noted that “We cannot solve our problems with the same level of thinking that created them”. We can use computers to help in new ways to solve many of our largest problems.

On the other hand, we will be in terrible trouble if we use computers to create new levels of problems for which we don’t have the stronger levels of thought to avoid and repair. A good analogy is that “nuclear weapons are dangerous in any human hands”, but “nuclear weapons in the hands of cave people are much more so”.

A great line by Vi Hart: “We must ensure that human wisdom exceeds human power”.

And we don’t acquire wisdom without considerable effort, especially with children who are just starting to form their ideas about the world they were born into.

---

See “most read books over last 10 years in the US” e.g. from USA Today website.

What is extensibility in object-oriented programming?

Programming languages have appearances (“syntax”), meanings (“semantics”), and efficiencies (“pragmatics”). A really good “extensible language” will allow each of these to be extended (and in large ways when this is a good idea).

It’s worth noting that the procedures and functions of an Algol-like language (C is an example) allow new operations to be programmed and invoked by names in ways that are parallel to the built-in operations. In some languages, some of the symbols can have generic meanings and be represented by more than one concrete meaning (for example, floating point arithmetic in most languages uses the same symbols as integer arithmetic). Some languages allow more meanings to be given by the programmer — for example to define complex number arithmetic. Some languages allow existing symbols — like “+” — to be used — “overloaded” — for this, while others require a different name to be used.

A really good extensible language will allow considerable control over appearance, meaning, and efficiencies. From this standpoint, it’s hard to find really good extensible languages in use today.

It is worth pondering the additional responsibilities that come with really extensible languages. First, experience with these in the 60s and 70s shows that even good programmers tend to come up with poor extensions in the heat of programming and debugging. The result is a gobbledegook that is even worse than average code. Extensions are meta-designs and need more thought and skill than many programmers are willing to develop.

On the other hand, really good designers using a language with a really good meta-system can be highly effective in just the opposite direction of gobbledegook. Still, even good extensions can impose a learning curve that may incur more human cost than the benefits (this is the flip side of the benefits of DSLs as well, and a good extensible language can make excellent DSLs).

The best extensible languages I saw in the 60s were Ned Irons’ IMP (CACM Jan 1970) and Dave Fisher’s CDL (CMU thesis 1970). These were both procedural, and were mostly concerned with appearance and meaning. In thinking about “objects” as “intercommunicating processes via messages” it occurred to me that if an object could receive a message as it saw fit, then each kind of object could have a minilanguage for its interface. This fit well into how Lisp 1.5 (and later) were starting to use FEXPRs (giving the unevaluated arguments to a function to the function so it could figure out what to do). Later, in the 70s, this idea — at the string level — was the basis of Doug McIlroy’s “pipes” programming ideas.

I used a variant of the Lisp idea plus a few insights of Dave Fisher in the first Smalltalk design. The body of a “class” definition was a kind of parser that allowed a wide variety of forms to be devised. This worked quite well (and it also had the “Tower of Babel” characteristics that required some care to make the extensions be of great benefit rather than just creating a lot of noise).

For deep meanings and efficiencies, some of the pioneering work was done by Brian Smith’s “3-Lisp” reflection ideas, the later Smalltalks, and the “Meta-Object Protocol” ideas of Kiscales, Bobrow, Rivera, etc. that had Common Lisp as a target. The approach of the latter was especially nice because it allowed to compiler to see deep extensions and when a good idea, incorporate them at the same levels as the most critical deep code. Fisher had a few elements of this as well.

A key idea here is that when “meta” is made available, really great debugging and graceful dealings with errors are critical. This should include “fences” that initially hold off meta-changes unless the programmers are really sure that they want to deal with these levels.

The philosophy here is a subset of a larger Parc observation about times “when a 2nd order theory is the opposite of a 1st order theory, and they are both true”. For example: the 1st order theory about not diving down the black hole of making one’s own hardware and software tools is true — but the 2nd order theory “if you have the chops to make your own tools then you should” is also true. Parc was partly based on making the 2nd order theory work.

Why do you need to use command prompt?

There are a number of good end-user solutions using GUIs that lie between pure text command lines and highly graphical programming.

With respect to graphical programming, it’s important to note that Ivan Sutherland’s Sketchpad — my vote for the first real interactive graphics system (1962) — was programmed graphically (in terms of telling the system to “figure out how to make this happen” — using constraints).

image

Ivan Sutherland making a bridge simulation from scratch by drawing and programming with constraints.

Ivan’s older brother Bert — after getting out of the Navy — did a PhD thesis (ca 1965) called “Online Graphical Specification of Procedures”, which was my vote for the first data flow and “wired up” components graphical programming system.

image

A graphical program for square root that creates a new component (in the upper left quadrant) using connectable dataflow programming.

Just a little later (ca 1968) a very usable flowchart programming system — GRAphical Input Language (GRAIL) — was done at RAND by Tom Ellis et al.

https://youtu.be/QQhVQ1UG6aM

This system was capable of larger scale programming as well as simple programming for end-users (which might be a single screen-full of wired up things to do). Each element can be tapped on to reveal its programmed meaning, also in terms of flow diagrams. This allowed them to program the entire GRAIL system in itself. For debugging, the diagrams can be run slowly (there was a speed control) and would show the actions through the diagrams step by step.

Another constraint system that followed in the footsteps of Sketchpad, was ThingLab by Alan Borning (ca 1977) done in Smalltalk at Parc.

Another early dataflow system was Fabrik, done by Dan Ingalls et al. in Smalltalk in the early 80s.

An exemplary system at “industrial strength” was the graphical dataflow system “MetaFour” done in the late 80s by Parc alums Dave Liddle, Yogen Dalal, et al. which was used heavily for a number of years by large companies — e.g. Proctor and Gamble — initially for “data mining”, and then for many other applications. The end-users graphically connected “data sources” (represented as a “node”) to “data filters” and “data massagers” (also represented as nodes). The system was supplied with a large repertoire of useful nodes — some of which allowed further programming in various ways, including the ability to encapsulate a whole dataflow diagram system. This system was nicely matched to the end-users and was supplied with large CRTs that provided enough real-estate for the graphical programs.

As with all programming systems, graphical programming is generally much easier to write than to read. Because there is less directionality in 2D or more D, reading can be extremely difficult (like reading electronic circuit diagrams).

Alan Borning’s ThingLab solved part of this problem by allowing dual mode operation: generally one programmed by graphical building, and a visitor would understand by moving back and forth between the construction and a textual rendering that was more narrative-like. This was a good solution.

The MetaFour system’s solution was also a very good one in that the problem domain and how it was handled plus the available hardware, display, etc., were all well matched up.

The advent of wide angle VR could make a big difference. Graphical programming requires more real estate, and is much more effective with larger viewing angles (this is also true for good UIs for textual programming, just more so). My guess is that an attempt at a new system today by really good designers and programmmers who are willing to do the extra UI work needed (these are in very short supply these days), could produce quite a breakthrough in “more-intuitive programming”.

However, programming is not the same as good design — either for functionality or appearance — so at some point computerists really need to buckle down to learn how to do good systems design regardless of the means of programming …

Which books should I read to learn about the stories and anecdotes which lead us to the "modern" computer science?

I think the 20 or 30 books I have read about this are far less than half the books published about the history of computing. Most of the ones I have read are flimsy and inaccurate. The easy part of this for me to check was that the books that featured incidents I was a direct witness to or a direct part of are quite inaccurate.

There are a few that are worth the effort. For example, by far the best book about the work that led to personal computing, the GUI, Ethernet, Internet, etc. — the work of the ARPA/Parc research community, etc. — is “The Dream Machine” by Mitchell Waldrop. There are inaccuracies in this history also, but Waldrop spent 5 years researching this book, and the result is a pretty good job over all.

There are a number of technical histories written by the researchers themselves, most of which have been well vetted. For example, the “History of Programming Languages” series set up and published by the ACM gives a pretty good picture of how individual programming languages were invented and made. The ACM also published a “History of Workstations” book (and has published the videos of the public talks by many of the key researchers).

As for the term “computer science”, I take it literally and in its original intended meaning (where “science” is supposed to mean “science”). How this all came about would be a fascinating and worthwhile book, and there could be one that I’m not aware of.

What does Alan Kay think of Dynamicland?

A terrific and beautiful world of ideas and experiences!

How usable was composition between widgets in Smalltalk 80 on the Xerox Alto? Could full applications be developed this way?

The answers to each question are somewhat different. First, there weren’t really “applications” of the stove-piped variety people are used to today. Every object was encapsulated and could be a component of a system, and every object was viewable (and views were also objects (every kind of thing was an object)).

So, if we look at an Alto screen shot of an early version of Smalltalk (ca 1974), we can see a superset of the kind of editor I’m using to write this answer.

image

The differences are that I can both use multiple fonts and make my own font characters and whole fonts as needed (the little view in the lower left is a view of a character that lets one edit it). That would be nice to have these days.

Each class with visible views would carry the needed editors as part of the class definition.

Another example of this (and another difference) is that the illustration which is inserted in the galley is being sketched in place — you can see that the UI for this sketch object has sprouted around it, like a “halo” or “frame”. So “pictures” are also active entities that can be “edited”. That would also be nice to have these days.

The way the galley editor was made was to create a galley class, whose objects were held in a sequence, and whose views were in the same sequence and which subtended the size of their visible contents (worth noting that this is the same general kind of thing that a string and string view are and a paragraph and paragraph view are). Any view of any object can be part of any composition.

In turn, the galley editor can be used as a component of a more complex organization.

More fun — and why we invented MVC — is the multiple viewing-editing architectures of complex tools. Here is the first one — done by the amazing Trygve Reenskaug — showing a planning system where the different views are looking at the same plan in different ways (this would be ca 76–77).

image

We can see a critical path view at the top right, and below it a “Gantt view”. In the lower left is a simple list view of the plan stages objects. A selection there will select the same object in all the views that show it.

The user is about to manifest another view of the same plan. Each of the views is independently editable and will cause the desired changes in the underlying model.

As a final example, here is a demo in a revived Smalltalk from 1978 that gives a sense of what kinds of things were done back then (this is also the Smalltalk version that Steve Jobs saw in his famous visit to Parc).

https://www.youtube.com/watch?v=AnrlSqtpOkw&t=135s

About 10 minutes into this there is an example of coordinating two kinds of visible objects (painting and animation) to make a richer tool.

The first question was “how usable?” was this? It was pretty usable for developing programs and systems, and could have been much more usable for casual end-user combinations. This is still an interesting set of problems that have not yet been solved all together. Hypercard contributed some important ideas for end-user scripting (though it wasn’t object-oriented enough to allow this kind of composing). Quite a bit more in some important areas was achieved by the Etoys system (done in an open source version of Smalltalk called “Squeak”).

Why were computers like the DEC PDP-10 called minicomputers, since they weren't really that small?

The PDP-10 was a “mainframe computer” and was not called a minicomputer. The PDP-8 was called a minicomputer, as were most of the early PDP-11s.

One angle on the name, which to some extent correlated with physical size was the “word size” of the machine. Most of the commercial mainframe machines starting in the late 50s had word sizes of 36 bits to 48 bits. The IBM System 360 computers were 32 bit mainframes. Etc.

One of the earliest commercial “smaller” computers was the PDP-1 (introduced ca 1959–60) which had an 18 bit word, and an instruction set that resembled its ancestor Whirlwind (on which Ken Olsen and other eventual founders of DEC had worked in the early 50s). I’m pretty sure that we called the PDP-1 a “mini”.

image

One of my heroes — Steve Russell — the original implementer of both Lisp and SpaceWar — with the restored PDP-1 at the Computer History Museum

Interestingly — and to the point of the above — the SDS-940, a small computer with a 24-bit word and half a MIP, was referred to as a “mainframe” (it was the machine that ran Engelbart’s “Mother of all Demos”).

How did humanity find out how to make fire?

We don’t know.

Most of the answers that speculate that fire from natural causes was carefully preserved long before making it was discovered are almost certainly correct.

The American Indians did this in a number of ways, and the practice was continued in many cultures after humans could make fire, both for ritual reasons — the “keepers of the flame” — and for practical reasons — it is generally a lot easier to preserve a glowing coal than to make a glowing coal.

As for discovering how to make fire, it would help most people to first learn how to make a fire from scratch using three or four known early methods. This will impress as to the number of things that have to go right to have it work when you are really trying to do it.

A good one that does have most of the things present, and has a good chance of producing the very same kind of thing as the preserved glowing coal, is drilling holes in beads, bone, wood, etc. The only answerer that I noticed had mentioned this possibility was Mark Werner.

There are several good YouTube videos showing both a hand drill (it’s amazing that this works as well as it does) and a bow drill, which is my favorite likely answer to this question for most cultures that did discover how to make fire. Drilling something with a hand or bow drill has the nice property that the cast off material is finely granular and thus really prone to combine frictional heat with oxygen in the air to start combustion. The result will resemble the precious preserved glowing coals that the “keepers of the flame” are tending. From there the process of getting to a real fire is the same: tinder and blowing and then kindling.

https://www.youtube.com/watch?v=ETzcTwxLx3M

Is 60Hz not fast enough for a VR headset?

There are a number of human vision characteristics that need to be handled really well to make VR a good matchup.

(1) Foveal vs Peripheral acuity - most of the acuity of a human eye is in the small foveal region of about 2x3°. We are pretty much legally blind outside this region. For vision, our eyes saccade (scan) the visual field and our brain stitches this info together and combines it with what we believe is there to create what we call “reality” (which sometimes is not close).

McDonnell-Douglas did a VR headset years ago that tracked the eye and put hi-res into the foveal region. If this is not done, then very high res needs to be used for the visual field that saccades can reach.

(2) Foveal vs Peripheral sensitivity to brightness change — to help deal with the large differences between foveal and peripheral acuity, the peripheral vision is many times more sensitive to a brightness change. These also cause the eye to saccade to bring the fovea to bear on what might have caused it (this is probably a sensitivity to “attack from behind”). This can happen for some people about 120 Hz and above, and most will have some sensitivity at 90 Hz.

And, note that the need for wide angle in VR (and in physical displays for computers, TV and movies) will affect the peripheral sensing much more than narrow angles. 60Hz was originally picked for several reasons: a big one was that it worked well enough for tiny TV screens across a living room.

With wide screens and low frame rates the viewer’s eyes will be saccading violently to the corners of the screens, and this can cause vertigo and headaches. This can be made extreme by also having large pans and zooms in the content, either by head movement or by movie/tv content (these force brightness changes in the periphery also.

(3) Tracking — lags in tracking can cause vertigo, as can too low a tracking rate (they have some overlap). This is often more serious an effect than too low a frame rate for the images (and they overlap).

(4) — Focus and Stereo — most headsets still focus on the display element in a way that the eye recognizes as “close up”. If the viewer wants to look at a “far away” object their lenses will “focus out” but this won’t work. This is also a source of headaches. If an object “gets close” (within about “8 feet”) the eyes will try to converge, and this can get painful if the object is presented as “really close”.

Bottom line is that many people are affected in numerous ways by the so far poor human factors of VR headsets, and many are willing to put up with them. The variation in humans mean that some will really be affected by one factor or another to the point of not being able to use them.

---

Addendum to (2) — added Aug 29 2023

It’s very worth noting what Jeri Ellsworth — a super creative engineering mind — did with the Tilt 5 HMDs (look up “Tilt 5”). She wanted to be able to run her unique table-top “corner-reflective” display from personal computers that might only have 60fps. She knew this would not be enough, so she designed and made a special interpolating chip that takes in 60fps and outputs 180fps, which, for most cases, will provide a much more realistic input to the human eye. The wider the angle of view, the more the frame rate needs to be attended to (this was partially advanced for wide screen movies by using a spinning shutter that shows each frame of film 3 times — but without interpolation (which would help)).

Why do people say that Ada Lovelace was the first programmer, when Babbage wrote the first programs for his own machine?

Goethe said “We should all share in the excitement of discovery without vain attempts to claim priority”. He meant not just “early” but any time an important idea is realized by a human being.

The pop cultures — and to some extent the more developed cultures — try to create interest through hype, and “being first” or “claiming first” is one of many ploys.

What we should all celebrate is that both Babbage and Ada were able to realize and understand a number of important ideas that most people today still don’t understand. We should celebrate Ada’s abilities to explain, to use mathematics and to write programs regardless of whether Babbage was also a mathematician and could write programs.

We should mourn that neither were able to get more understanding and support for these ideas.

We should regret that Ada died so early. And that Babbage was too stubborn to change his initial ideas about how to make a working computer (there were far better routes available to him in the last 20 years of his life).

And so forth.

And I don’t think we should worry much about “what people say” (they say lots of things for their convenience …)

What does Alan Kay think about Joe Armstrong claiming that "Erlang might be the only object-oriented language" and also his thesis supervisor’s claim that “Erlang is extremely object oriented”?

I love Joe Armstrong — we lost a great man when he recently left us.

And, he might be right. Erlang is much closer to the original ideas I had about “objects” and how to use them.

However, another way to look at this is to consider “What Is Actually Needed” (WIAN) and realize that much more is needed beyond what we are programming in today.

Joe would most definitely be more in favor of this idea than worrying about what either one of us did decades ago.

Who are the most influential people in the Computer industry?

I think there are too many for a fair account in a Quora sized answer, especially if we give credit to insights regardless of whether they were followed up.

I think of “computing” as processes that help one deal with ideas. And this covers a lot of ground (including various forms of mathematics, etc.) So: computers are things that manifest processes that help one deal with ideas.

We don’t know who invented the abacus — probably parallel inventions in a number of cultures — or the later Antikythera device.

One way to do this is to emulate Physics and pick just a few names that would be on any list, and apologize profusely in advance for inadvertent omissions.

This is my personal pantheon of heroes with regard to written store program digital computers.

Boole made a difference because his ideas were picked up and used by Shannon to create a “math” and a “methodology” for designing complex switching logics and circuits. (I love what Charles Sanders Peirce did with Boole’s book — including coming up with NAND — but the results were buried in his notebooks and missed being noticed and influential.)

Turing made a difference by creating profound intellectual foundations for stored program computing (he didn’t invent the idea, but he did invent and explain the cosmic scope of the idea).

I love what Konrad Zuse did, but he was not in any direct historical line. Similarly: Stibitz.

Von Neumann made a difference by lending his reputation and interests to making real stored program computing devices (much of the contribution here was due to others, but JvN was a key factor).

Leaving out the secret Bletchley Park work, we have Max Newman, Williams, et al at Manchester and Maurice Wilkes et al at Cambridge actually getting stored program computers working within a few months of each other in 1949.

This era is crowded with early starts, and includes Eckert and Mauchly with Univac I (ENIAC doesn’t count as a stored program computer), and a number of university computing builds all based on the “Johnniac” designs (e.g. Illiac I, etc.)

Jay Forrester and Bob Everett created Whirlwind ca 1950, one of the first parallel (and fast) computers for real-time interaction, plus displays with a pointing device (the light gun). Core memory was invented on this machine and was an enormous breakthrough in practical computing. Laning and Zwerler did the first higher level language — George — on Whirlwind, and this was also real-time and interactive.

The first higher level language I liked was Algol 58, which was partly done as a reaction to both Fortran (John Backus) and the precursor of COBOL (Flowmatic by Grace Hopper). Prominent names associated with Algol 58 were Backus, Fritz Bauer, Alan Perlis, etc.

John McCarthy for Lisp and the Advice Taker (late 50s).

Bob Barton for the B5000 architecture (ca 1961)

Ivan Sutherland for Sketchpad (ca 1962).

I’ll pick there to stop, both on general principles, and also because I started ca 1961, and “history” to me was before that time ….

And also because things get really complicated from the late 50s onwards … The ACM Turing Award (started around 1965) commemorated some of the heroes, but missed many more.

Violating the above good idea for stopping, I think I must mention Margaret Hamilton) for truly singular important contributions in the period 1961 through 1969 who wound up as the eventual leader of the programming effort for the Apollo Guidance Computer, and whose revolutionary “real software engineering” designs saved the Apollo 11 moon landing. She has been recognized with the NASA Prize, the National Medal of Freedom, and a Computer History Museum Fellow, but these still seem a bit short of what is deserved

Is traditional engineering becoming a pop culture due to the modern practice of software development?

I don’t know, and I suspect not in the way that computing has been going.

Much of traditional engineering is life threatening unless done carefully and scrupulously, and most engineers really take their calling seriously.

For example, I think that most engineers are just sick about the 737 Max horrible doings, and the full story is likely to be quite ugly when all is revealed. Certainly, almost no traditional engineers would ever cut corners, create single points of failure, omit user training, and all the rest.

Another part of traditional engineering is that much of it deals with the physical world in which materials cannot stand up to strong forces — so care and craft have to happen. In the computer, the “forces” are weak, and the gossamer cobwebs can take quite a while to crash — and most crashes in the past have not had human lives at stake.

Looking into the future, more and more software is starting to affect human safety, and more traditional engineering care will be needed. I don’t see that “computing” has much of a tradition of being careful in this way, and certainly the systems building tools and methods are not very good at helping “real engineering practices”.

Is Archibald Putt a creation of Alan Kay?

Not this Alan Kay …

What are Alan Kay's views on automation and technological unemployment?

I think the issues revolve around a society’s conception of itself, its main societal goals, and the tradeoffs between our genetic heritage, the traditional cultural views of “reality” and mores, and the more recent inventions that are more distinct from the past (modern science, ideas about equality and human dignity, etc.).

“Ethics” and “morality” can use human sciences to help reason, but in the end they are more like math in that they are generally formed by choosing basic principles and then using reason to ramify out ideas about good behavior.

For the current question, we could imagine choosing as one of the basic principles of a society, the idea and goal of human dignity (this was a frequent and useful theme of Martin Luther King).

We can look at the variation of abilities, talents, motivations, circumstances, etc. in a population and ask how can a society permit generally useful changes while preserving its values? (I’m picking “dignity” here as a stand in for all of them).

So we need to ask about the general propensities and abilities of adults to qualitatively change their behaviors in mid-life, and ask about the abilities of schools to change how they are trying to shape and help children to become adults (and no small part of this requires the adults in schooling to make similar large changes as above).

To oversimplify, we can think of societies as having something like a viscosity that damps down accelerated changes in attitude. Whereas science and technology has much less “viscosity”: these are mainly based on knowledge, method, and technique — they were quite opaque just a few hundreds of years ago, but this is because how was invisible. Now that “how” has been unveiled, it is relatively easy to make progress.

We are in the middle of mismatches between older rhythms of living — including ideas of cooperation and competition — and the new methods. The mismatches do and will cause great disruption and unhappiness for some large parts of a society, so the question could be “should we try to smooth out unhappiness by distributing it more evenly, or should we allocate it just to one sector so that everyone else shall be ‘happier’ ?”

Rampant capitalism allocates happiness in one direction, socialism and communism allocate differently. From the standpoint of systems design, none of these past systems have been set up well — in part because they ignore the larger factors mentioned in the beginning here.

To bring this back to our current situation, systemic change for the majority of human beings is mostly generational, and requires generational changes in the education system. If dignity is one of the values that need to be preserved, then the society has to make plans and allocate resources to head off simple but catastrophic disruptions that simple but catastrophic competition and US versions of capitalism encourage.

The current drivers of society almost completely miss that civilization and progress were set up in very large part to transcend human genetics — in other words, we humans want and need much more than “survival of the fittest”. We don’t want bad eyesight to doom a person — we invent eyeglasses. Similarly we don’t want “bad eyesight” to doom a society — we have started to invent better ways to organize societies and to give us “more sight” through science, deeper thought, and more careful planning and simulation,

We have invented larger values than “survival of the fittest” and we need to find the means and the will to make them flourish in a richer conception of society.

How and why did Xerox fail to capitalise on PARC's inventions?

There are good enough answers under this Quora link (this is an example of a question that has already been asked and answered).

Why couldn't the management of Xerox see the goldmine that they had with inventions at Xerox PARC? Was it really about lacking money to capitalize? How about stock issuance?

What does Alan Kay think of the Tablon and similar portable practice devices (I carry mine with me almost everywhere)?

Hi Lawson

About 6 or 7 years ago my left thumb base joint gave up the ghost (arthritis, bone spurs, loss of cartilage, etc) and I had to give up guitar. About a year ago I heard a wonderful lute player — Thomas Dunford —

https://youtu.be/Y6VZ2mk5K5M?t=39

and was so taken with his playing that I asked the Lute Society in the UK about thumb-bracing on lutes and was told that it is much less than on a guitar.

I eventually wound up with a really great instrument — a 7 course Renaissance style lute of the kind that was starting to be played around 450 years ago as the Shakespearean age and the golden age of English lute music was starting.

image

It is as light as a feather and very portable compared to a guitar, so I have been taking it with me on my travels. I’ve been able to avoid re-inflaming my thumb joint by paying constant attention to how it is feeling (it is always signaling) and stopping to give it a rest many times each day. This is working pretty well, and it is a wonderful experience to get back to an instrument that is more intimate and expressive than keyboard instruments after thinking I would never be able to play these again.

It is surprisingly different to learn to play compared to a classical guitar (though there is some transfer). The right hand and arm are oriented so the fingers point up along the strings (rather than across from the top as in a guitar). The playing is with fingertips rather than with nails, and one is playing on double strings (except for the top string). The range of tone and nuance (and difficulty) seems to be somewhere in between a guitar and a violin (one feels that one’s fingers are each a little bow). The direct touch of fingertips on both hands is extremely intimate and pleasurable.

The frets are made of gut, and tied on. They are much lower than guitar frets and this requires quite a bit of relearning of the left hand — in many cases, especially for chords, you have to get your fingertips closer to the frets to get a clear good tone. Not quite as demanding as a violin with no frets but a lot more demanding that guitar. A super benefit of the gut frets is that they can be moved to get pesky intervals — like 3rds and 10ths into great tune (the poor tunablilty of guitars which are set up for equal tempered intervals has always bugged me). On the lute you can adjust — slant even — a fret or two to make the sound so much nicer.

I should say a word about lute timbre and “presence”. This has been the largest revelation and surprise. If you play 3 or 4 part harmony on a piano the sound is “thin” compared to 4 brass instruments playing the same notes. Guitar is “pretty darn thin” compared to most orchestral instruments.

The lute is not very loud and doesn’t have a lot of sustain, but it is amazingly “not thin”. There is something about the timbre that is “memorable” so that a line that is broken by notes dying away or by hand movements, can still be sustained in the listener’s mind because of the continuity of “presence” in the timbre (you can hear this very distinctly in Dunford’s playing). Sparce voicings that would not sound great on guitar sound very good on the lute. I find this quite magical as I’m gradually learning to play some of these pieces.

Some lute pieces demand quite a bit of technique, but a Tablon for the lute wouldn’t help what the right hand has to learn. But I’ve found that the light weight and small size of this style of lute is very amenable to traveling when I have to.

How can I help develop a wiki-Pharo? (Many people using and editing the same Pharo environment image in real time.)

I suggest you not worry about Pharo — or any other programming environment — but do ponder scaling, “number of cooks”, dependency, and coordination issues.

One of the ways to do this is to look at complex structures, both in nature (Biology) and human-made (cities, countries), to puzzle out how and why the scalings worked.

Cities might be a good place to start. How does “independence of thought” trade-off with “coherent esthetics”, “needed services”, etc.?

What degrees of freedom should be allowed for different levels of things and processes? For example, can anyone in a city just decide on a whim to change the voltage in electric power services? Can a city law require all houses to look the same? Can a city law require certain styles to be followed? Or: can a city law require that no locks be allowed on private dwellings? Or no privacy … etc.

Most programming languages have ways to request services. A key issue is what services can be depended on. The ones made locally are under local control — and can still have bugs or be “improved” in ways that were not assumed by the service user — but these are relatively easy to deal with. But how about services that are supplied by the environment. How does one know that one’s assumptions about the services are accurate enough to allow use — and how can one re-vet when a global service has a new version?

It is pretty annoying that in 2019, there is not a typing/interface scheme that is semantic enough to deal with the coordination of remote services and dependencies.

If I were going to try to do what your question asks, I would aim at the two big problems of (a) “governance” and (b) assumptions and dependencies.

Then I would make a programming system and environment that fit these needs. The methods used to invent Smalltalk (Pharo is a version of Smalltalk) at Xerox Parc will work well for creating What Is Actually Needed (WIAN). I have outlined some of the first steps above …

Rich Hickey, creator of Clojure has said that object-oriented programming is unable to model “time” properly. What does "modeling time" actually mean?

Most languages don’t model time — and worse, let the CPU determine what time means.

But the need for it, and a variety of solutions have been around for at least 60 years now.

Part of John McCarthy’s long ago insight (that after 60 years I wish computer people had grokked by now) is that you just have to model time along with everything else that is modeled. Then you wind up with parallel histories that both have the memory of things that we want and need and we also have the memories of when. This renders “mutable” and “immutable” moot (“mute?”) and besides the point.

John’s desire to have both unambiguous reasoning about situations and actions and causes and to have changes over time led to his powerful ways to deal with these. I think most of the confusions today stem from too limited a view on computing — very likely from taking programming practices as “computing” (whereas they are distressingly distant from each other at the present time).

I’ll get back to this in a bit. It’s worth noting that forms of dealing sensibly with time as histories have existed in the database world for many years, in the forms of versioning, atomic transactions, etc. The programming language Lucid had all variables retain histories, etc.

All of these allow one to picture the progress of a whole system as the world lines of what it uses for state changes. Could be “data”, could be “objects”.

A key notion and issue that arises is to ask questions about “consistent states” in the history (McCarthy did). If we think of the world lines as progressing upwards, then “consistent states” are horizontal “layers” or “surfaces” (John called them “situations” — AI sometimes called them “frames”). The idea is that we want to know the pseudotimes when the whole system can be considered consistent enough to allow reasoning. The progression of “situations” represents the transitions in what is essentially a phase space. From John’s point of view, the identity of a situation allows logical reasoning to be done consistently.

Note: the data base world has very ad hoc to no solutions to “situations” across data bases (but it’s clear that deductions drawn from data bases with different dates and versions might not be reasoning within a situation).

My long ago approach to “objects” was partly inspired by Sketchpad and Simula I, both of which had what could be called a “simulation approach” to modeling things. (We can see that this is quite compatible with what John was trying to do with fluents and his robot programming and reasoning.) If you have a simulation approach and use fluents and situations, then you can think about getting your objects to the next consistent situation by letting them use the current situation as inputs to what is essentially a kind of functional programming (or logical reasoning) to “deduce/produce” the next consistent situation.

We really wanted to do this at Xerox Parc but our computers were much too small even for relatively short histories to be retained. However, various systems have done this since. For example the Croquet system done by Viewpoints about 15 years ago.

And those who are interested in this idea should definitely take a look at Leslie Lamport’s many papers on these issues, and David Reed’s 1978 MIT Thesis on “NETOS — an operating system for the Internet”. And at David Jefferson’s work. Etc.

When one starts looking at things from this point of view, it should become apparent that object techniques (using encapsulation, messaging, reflection, etc.) are well suited to being upgraded to having a full model of “time”.

How can I learn more about the original spirit of computing (as alluded to by Alan Kay, et al.)?

I take “the original spirit of computing” to mean “for the fun and the art of it”, and the “art” part meaning to create better works of art at all levels, especially the foundational levels.

If we take “computing” as “manipulating information to satisfy goals”, then it’s been around for quite a long time, both for “fun and art” and for “purposes less fun” (for example for business and the military).

So, I think of “the original spirit” as always coexisting with much more pragmatic applications. And I think it is still around in various forms. An interesting question would be whether the percentage of people doing “original spirit” stuff is smaller than in times past (I’m guessing yes).

Another question is can those who want to do “original spirit stuff” find avenues for this?

I think the answer is generally “yes”, and also in many ways “easier than in the past”. This is because the casual amount of computing capacity available to everyone in a laptop, etc. is enough to do almost any kind of prototype. Just for one example, take a look at Anselm Eichoff’s “Citybound” simulation/game system programmed by him (via making a new kind of object system and then building millions of concurrent objects on that, and all running on a laptop).

https://www.youtube.com/watch?v=LiIoE8ArACs

This is enough local capacity to prototype ideas of Internet scale.

For HW assist needs, it is possible to buy (for about the cost of another laptop) FPGA accelerators — which can plug into your laptop — on which supercomputers can be defined to help one’s software needs).

The tradeoffs in the past were (a) tiny slow hard to build expensive machines vs. (b) great exploratory research funding from ARPA, ONR, etc.

That great funding helped to bring the vast resources of today but the great funding for small research teams is not present at this time.

But if you want to see “The spirit of computing” alive and well today, take a look at Anselm above …

Experienced programmers and computer scientists, what are some really old (or even nearly forgotten) books you think every new programmer should read?

I love that “2006” and “2008” (in another answer) must be considered “really old” (which is what the question requests) …

I’m still a big fan of the “Lisp 1.5 Programmers Manual” (MIT Press — still in print). This version of the language is no longer with us, but the book — first written ca 1962 — by John McCarthy, who invented, and his colleagues, who implemented, is a perfect classic.

It starts with a version of John’s first papers about Lisp, and develops the ideas in a few pages of examples to culminate on page 13 with Lisp eval and apply defined in itself. There are many other thought provoking ideas and examples throughout the rest of the book.

The way to grow from this book is to deeply learn what they did and how they did it, and then try to rewrite page 13 in a number of ways. How nicely can this be written in “a lisp” using recursion. How nicely can this be written without recursion? (In both cases, look ahead in the book to see that Lisp 1.5 had gotten to the idea of EXPRs and FEXPRs (functions which don’t eval their arguments before the call — thus they can be used to replace all the “special forms” — do a Lisp made from FEXPRs and get the rest by definition, etc.).

What is a neat bootstrapping path? How could you combine this with Val Shorre’s “Meta II” programmatic parser to make a really extensible language? What does it take to get to “objects”? What are three or four really interesting (and different ways) to think about objects here? (Hints: how many different ways can you define “closures” in a language that executes? What about using Lisp atoms as a model for objects? Etc.)

The idea is that Lisp is not just a language but a really deep “building material” that is tidy enough to “think with” not just make things (it’s a “building material” for thoughts as well as computer processes).

Dani Richard reminded me to mention: “Computation: Finite and Infinite Machines” by Marvin Minsky (Prentice-Hall, 1967), which — since it is one of my favorite books of all time — I’m surprised I didn’t include in the original list. Marvin could really write, and in this book he is at his best. It is actually a “math book” — with lots of ideas, theorems, proofs, etc., — but presented in the friendliest way imaginable by a great mind who treated everyone — including children — as equal to him, and as fellow appreciators of great ideas. There are lots of interesting things to ponder in this book, but perhaps it is the approach that beckons to the reader to start thinking “like this” that is the most rewarding.

“Advances in Programming and Non-Numerical Computation” (Ed. L. Fox) mid-60s. The papers presented at a 1963 summer workshop in the UK. The most provocative ones were by Christopher Strachey and several by Peter Landin. This was one of the books that Bob Barton had us read in his famous advanced systems design class in 1967.

Try “The Mythical Man-Month” by Fred Brooks, for an early look and experience with timeless truths (and gotchas) from systems building with teams …

Try “The Sciences of the Artificial” by Herb Simon. A much stronger way to think about computing — and what “Computer Science” might mean — by a much stronger thinker than most today.

“A Programming Language” by Ken Iverson (ca 1962). This has the same thought expanding properties of Lisp. And, like Lisp, the way to learn from these really old ideas is to concentrate on what is unique and powerful in the approach (we know how to better improve both Lisp and APL today, but the deep essence is perhaps easier to grasp in the original manifestations of the ideas). Another book that Barton had us read.

I like Dave Fisher’s 1970 CMU Thesis — “Control Structures for Programming Languages” — most especially the first 100 pages. Still a real gem for helping to think about design and implementations.

More recent: (80s) “The Meta-Object Protocol” by Kiczales, et al. The first section and example is a must to read and understand.

Joe Armstrong’s PhD thesis — after many years of valuable experience with Erlang — was published as a book ca 2003 …

Lots more out there for curious minds ….

What does Alan Kay think about tension between sharing research methods and findings freely and early & preserving commercial viability so the findings can be made into products that change the world?

What do you think about the Internet? Etc.?

The question doesn’t really make sense as stated because sharing the best ideas freely creates the greatest opportunity for conversion into products, whether world changing or otherwise …

I.e. what “tension”?

The really big ideas — like those from the ARPA/Parc research community — can’t be reasonably developed by any single organization.

Really big ideas create new industries for the whole world.

Most business people and politicians do not understand this at all.

American computer pioneer Alan Kay’s concept, the Dynabook, was published in 1972. How come Steve Jobs and Apple iPad get the credit for tablet invention?

The Dynabook idea happened in 1968.

image

But the simple part of the idea — a personal computer on the back of a flat screen display with a stylus and touch sensitivity — is hard to consider a real invention given:

Flat-screen displays. I saw the first University of Illinois one in June 1968 (Bitzer and Slottow), and I knew about the work at both RCA and Westinghouse.

image

The RAND Tablet was invented in 1964 by Tom Ellis et al.

image

The Flex Machine (self portrait below) was started in 1967 (Cheadle and Kay). It had a calligraphic display, but we made a stylus-tablet for it.

image

Moore’s Law (first described by Gordon Moore in 1965).

The U of Illinois’ purpose for their flat-screen display was for a non-portable time-sharing system terminal. The flat-screen “newspaper reader” described by Arthur C. Clarke in “2001: A Space Odyssey)” was just for consumption. And flat-screen TVs “soon” had been a staple of Popular Science for many years. These were not all that interesting. But thinking of using a flat-screen display for a really portable personal computer system was much more so.

I knew how many transistors were in the Flex Machine, so the question was “When could we get these on the back of a larger enough flat-panel display?”. This was a lively topic of conversation at the first ARPA grad students meeting (1968, U of Illinois) after seeing the little plasma panel demo.

The answer we calculated was — if the US were to really try — about 10 years, but very doable, especially if a solution could be found for portable low power 2nd level memory.

But the actual “Dynabook” idea came a few months later in the Fall of 1968 after I had visited Seymour Papert and Cynthia Solomon’s first LOGO classrooms. This changed my view of computers and personal computers from just “vehicles” and “tools” to “meta-media” and “ for children in important ways also”.

In other words, what got added to the simple idea was “cosmic purpose”, “service”, “curricula”, etc. The big hit from Papert and Solomon was that careful use and design of interactive computing could make a qualitative difference in the higher-level shaping of children’s thinking — not just learning important things earlier, but in taking on a much more powerful “epistemological stance” towards the world they were growing up in — to the point where they should be able to think much better than most adults do today (not a big feat, but desperately needed), and be stronger shapers of the future (really desperately needed).

The context here was both the general woes of humanity, and also the sickeningly bad dynamics of the 60s, including the enormously daunting problems of race, the Vietnam war, and warnings about the environment.

Our project at Parc was built around trying to understand how to do the “larger purposes” of a Dynabook for what was inevitable technologically in the future. The first name for the Alto at Parc was “The Interim Dynabook”. Part of the goal of the project was to make at least 100 (we made about 1500) personal computers that were powerful enough to allow some of what a Dynabook would need to supply as “services” to fulfill the ideal. In the process we worked with hundreds of children and made many systems and user interfaces to find better matchups between them and the new medium.

image

Steve Jobs was very taken by the Dynabook idea, and wanted to do one when I joined him at Apple in 1984 — his rallying cries then were “Wheels for the Mind!”, and “The kids can’t wait!”.

Many years later, the iPad was good at the simple media consumption convenience part of the idea, but fell quite flat in many ways with regard to the two larger ideas of (a) user authoring of meta-media, and (b) an environment to help children learn powerful ideas by making and sharing them.

The latter was not in Apple’s goals at all:

users (even children) were forbidden to make actively programmable things on the iPad and share them on the Internet

there was no stylus for drawing with the machine, and years later when one finally was added they omitted a place to store it (!) — and even still they did not include proximity sensing (compare with the Wacom Cintiq).

fluent typing was not addressed in the design (we knew in 1968 that even with a touch-screen, a physical keyboard would almost certainly be needed). It is somewhat bizarre (even in this bizarre world) that Microsoft took more into account in its physical designs for the Surface …

most importantly, Apple made no effort to help children by funding curriculum development, how to help them learn, etc. Steve’s earlier “Wheels for the Mind” ideal was long gone: buried in simple consumerism.

So: don’t confuse the Dynabook idea with the physical resemblance to the iPad. The latter has thousands of times the capacity of what I had in mind, but its conception is thousands of times more meager.

Today the goals for this should be much larger than those I had more than 50 years ago.

Who first came with the concept of Computer Mouse, XEROX or Apple?

Pointing Devices and Graphical User Interfaces (a partial history)

There was an early electrically sensed joystick in the 20s — partly invented for “remote-control airplanes”.

1926 Joystick Remote Control Circuit Patent

NRL’s C.B. Mirick patented his system for remote radio control of pilotless aircraft (or of “any circuits or objects which are to be controlled at a distance”) using a “‘joy stick’ identical with the ‘joy stick’ provided in aircraft.” U.S. Patent no. 1,597,416, “Electrical Distant-Control System,” Aug. 24, 1926.

image

“My invention relates broadly to distant control systems, and more particularly to a control circuit for a radio transmitter, and a selectively responsive circuit for a radio apparatus whereby particular controls may be caused to function individually or simultaneously.”

A track-ball appeared right after WWII.

The “light-gun” was invented by Bob Everett for Whirlwind in the very early 50s, used in the SAGE air defense system in the mid-50s on (which later also used track-balls), and memorably used by Ivan Sutherland in his iconic (literally and figuratively) Sketchpad system ca 1962.

image

MIT Whirlwind Light Gun ca 1951–2

Doug Engelbart and Bill English invented the mouse as an upside-down track ball in 1964.

image

The Engelbart-English mouse ca 1964

Also in 1964 appeared the really amazing RAND tablet, by Tom Ellis, et al.

image

Tom Ellis and the RAND Tablet ca 1964–5

Larry Roberts did the Lincoln Wand in 1965 — it was a very early 3D pointing device

image

Paul Rover (who did much of the “wand” SW) pointing at an icon in space

In the last half of the 60s all of these devices were in some use for interactive graphics systems, especially in the ARPA research community.

GUIs

The earliest graphical user interfaces came from the ability — using a pointing device — to indicate something of interest on a screen. It is hard to pin down a first here — could have been a radar system plus physical buttons. The Whirlwind to SAGE progression in the early 50s used displays to show both radar readings, coastlines, airplane codes, and sometimes text commands that could be pointed at. The nature of the slow calligraphic displays, flicker, etc., tended to limit the use of the pointing device to where only it could really serve, and relegated most other inputs to physical buttons and dials.

Ivan Sutherland’s Sketchpad system was done in 1962 on the SAGE test computer (the Lincoln Labs TX-2), used a light-pen (a light gun without the handle), a point plotting display, a lot of computing capacity, and a breakthrough set of ideas, designs and implementations including:

rubber band drawing of lines

automatic “attractive gravity between end-points”

constraints for programming — represented as icons

“masters and instances” (objects)

and many more of the earliest ideas for graphical interaction

Sketchpad cannot be praised too highly.

image

Ivan Sutherland making a bridge simulation in Sketchpad. He is holding the light pen. Other input devices are buttons to his left, and rotating dials in front of him

image

Using Sketchpad’s light pen to draw a rivet

At RAND, Tom Ellis et al were starting to make one of the great graphical user interfaces of the 60s (and all time). First, Gabe Groner used the RAND tablet to make the first really good gesture recognizers (ca 1965–6). This was used to make GRAphical Input Language (GRAIL).

image

There is a good demo video of this online. This system first introduced many of the GUI concepts we are used to (and a few we aren’t), including sliders, various kinds of iconic buttons, resizing and movement sensitive areas, intelligent drawing using the recognizers, etc. The boxes were hyperlinks to their interiors (which were also diagrams etc.). The diagrams could be simulated at various speeds, etc.

Another system that can’t be too highly praised.

And yet another:

The Engelbart NLS system that had the famous demo in 1968 was part of a larger overall concept with a strong balance towards organized text with a little graphics, hyperlinks, “panes” (vertically tiled windows), and an number of deeply important ideas.

image

Bill English and the NLS television display (from a calligraphic display) and the Herman Miller designed lap-chair with keypad, keyboard and mouse

image

Detail of the user interaction controls

This system has been poorly treated by journalists, who want to see in it precursors to what we have today. Instead most of the many important properties of these ideas have been quite missed today.

image

The Engelbart conception was a complex and important system made from highly intertwined subsystems that included humans, education, methods, languages, and tools.

Finally we come to Xerox Parc — also part of the ARPA research community — that got its first critical mass of researchers in early 1971. We were fortunate to have “all of the above” for context, plus some of the researchers above themselves (and the community itself was quite friendly).

We thought we could get a better interactive coupling with “personal computers” than could be had from time-sharing, so our effort was to see if we could make a one-machine-for-each-person system linked together with a network that could be comparable to the single user super computers used by Sketchpad and GRAIL. We decided to spend enough money and design effort to “work in the future” — to make personal computers for research that would be generally affordable via Moore’s Law in the mid to late 80s.

One of the very needed technologies was a computer display with the degrees of freedom of active paper. And so forth (I won’t recount the history of this effort here).

image

The Alto: showing the Engelbart 5 finger chord keyboard, the Parc digital mouse, the “supercomputer” below, a disk pack, the Ethernet “stinger” (upper right).

image

Full page display on Alto screen showing “printing-like multiple fonts”, half-tone painting and animation, multiple overlapping windows, icons, user-programming, user font making, etc. It was this screen and content that Steve Jobs saw when he visited in 1979.

image

The larger Parc display that can show two full pages.

image

An essay by a 14 year old girl about her programming experiences with Smalltalk that includes live coding in the bottom left, and a picture of the drawing system that she invented and programmed.

In retrospect, what were some funny idiosyncrasies of older computers?

Besides crashing randomly?

One that was funny in retrospect came in 1965 when I and others from the “future CDC 6600 community” — Livermore Labs, Los Alamos, National Center for Atmospheric Research, etc. — were called to CDC’s Chippewa Falls labs to help debug the supercomputer and cobble together enough software to make it worthwhile (this is another somewhat unusual story).

There were a number of causes of rather frequent hardware crashes of computers in those days. The 6600 seemed to have those plus a few more.

Looking from the top, the machine was the shape of a plus sign. You could open the panels and the logic was on doors that would swing out — the logic-doors were refrigerated racks of components with Freon tubes running through them.

image

The console — called a “Gazork” (a mythical beast) — was simply two CRTs (this required some of the machine to work in order to use).

image

If you look closely at the wires on the doors, you can see that they are often long and of non-uniform length — this is because they tuned the timings of the machine by clipping these wires with a nail clipper!

image

And many of the wires had little tags on them saying what the exact length should be when replaced.

Further, the wires were also used as delay lines for the “parallel arithmetic units” of the machine — e.g. there was one multiplier unit but it was duplexed by using the wires as buffers for the partial products (it is shown as two multiplier units below).

And so forth. Add to this that the machine had 11 different processors of two types, no interrupt system, no core memory parity test, the 1 microsecond memory had 32 banks (the low order 5 bits of an address) and you had to have these always cycling 100 nanoseconds apart in order to match up to the 10 MIPs processor, this by squeezing a few instructions into the tiny cache, etc., and you have more than a handful of “stuff” to get anything reasonable to happen.

image

Rough diagram of the 6600. Of note are the independent parallel “central” CPU abilities on the right, and the large “memory machine” on the left with the 10 “peripheral processor” machines.

The oddest phase was when some of the random crashes seemed to happen when there were more than 4 or 5 people in the room. The 6600 could count people?

However, this resisted tests.

One of the displays on the Gazork was used to look at parts of raw memory (in octal).

image

One day, just by luck, someone noticed a bit flickering on and off when the central computer was idle. This eventually was found to correlate with the doors swung out or not, and if out, with the number of people near that door. It turned out there was capacitive coupling between one particular door and the humans nearby, and this destablized one of the bits in memory (with no complaint because no parity checking), and sometimes this bit needed to be good for some programs.

I would call this an idiosyncrasy! And only funny after it was tracked down and fixed! There’s nothing all that funny about unreliably repeatable errors when they are happening to you …

What did the early room sized super computers do? And how powerful are they relative to modern day computers?

We could look at the AN/FSQ-7 — known as the “Q7” — which was the main computing architecture for the SAGE air defense system in the 50s.

It was actually two very large computers — comparable with a football field — where:

the computational elements occupied the second floor of a large concrete “blockhouse” (there were about 24 blockhouses around the US and Canada),

the air-conditioning and power supply occupied the first floor (and spilled over into an auxiliary building),

a command and control center was on the 3rd floor,

and many (~ 100-150) interactive graphical terminals with pointing devices for different kinds of tracking on the top floor.

image

There are better floor plans in the Q7 book (check it out), but consider:

image

The only good way to estimate the computing power of this machine relative to today would be to run benchmarks.

However, here are some statistics for each machine:

50,000 vacuum tubes

instruction cycle time = 6 microseconds for basic instructions

main memory (magnetic core) = 256K bytes as 65,536 32-bit words

secondary memories (drum) = 50 "fields" of 2048 words each = ~800K bytes

Much of the arithmetic was done on half words two at a time. The two machines ran the same program in case one went down.

Perhaps readers can try estimations against an iPhone or an iWatch.

Basically we are comparing billions of operations per second with about 170,000, so we should expect an iPhone to be considerably more than a factor of 5000 faster.

iPhone XS has 4GB of RAM which about a factor of 16,000 more.

Perhaps a single iPhone XS is about 16,000 Q7s?

I would be curious to see more careful comparisons.

How important was Ada Lovelace for computer science?

This is a fascinating question about a “synthetic” or “artifactual” science (math is one of these also). This is a parallel to asking “how many times was the arch invented?”, and “for a particular arch e.g. in the Arc de Triomphe, would it have mattered if the ultimate ancestor of that arch had died before the invention?” i.e. the arch is so useful that it probably was invented several times and once seen, was appropriated.

On the other hand, vowels in languages with symbols for speech sounds are so useful, one would think they would be invented numerous times, but history indicates it only happened once.

But calculus was invented several times, etc.

Babbage is not mentioned in Turing’s original paper on computability and “universal machines”, but years later is mentioned in Turing’s AI essay. In between apparently Babbage was mentioned at Bletchley Park during discussions with his mentor Max Newman, the driving force — with Tommy Flowers — behind the programmable Colossus. By this time there was nothing that knowledge of Babbage or Ada could have aided.

It is surprisingly hard to tease out where engineering ideas stemmed from, in no small part to the way engineering is learned and advanced. For example, history thinks that the first electronic flip-flops were by Eccles and Jordan ca 1916, but seems to be vague on whether the use of this particular milestone circuit fed into the Tommy Flowers “valve logics” at Bletchley, or for ENIAC.

Flip-flops themselves had been around mechanically for a very long time — most especially in the registration mechanisms of 19th century pipe organs (something that it was clear that Babbage was not at all aware of — or he might have seen a much better way to make a computing engine ca the 1840s (the best technologies around this time for computing were pneumatic).

I think it might be fair to suggest that neither Babbage nor Ada were required to get computing invented, and it is likely the case that neither of them were direct inspirations for the much later inventions.

But we should celebrate them for being “early with insights”.

A similar chain to follow is that from Boole to Charles Peirce (a brilliant thinker who wrote ca 1870s that you could make a computing engine by using Boolean Logic and first proposed what we now call the NAND and NOR as universal logic operations). This operation is called the “Sheffer Stroke” today because Sheffer rediscovered these ideas in 1913.

Finally, it’s worth looking at what Konrad Zuse accomplished pretty much all by himself during the early 40s. This came to light years after the independent inventions of most of the same things that Zuse wound up with.

When we had to write histories for the ACM History of Programming Languages the eminent historian of science Mike Mahoney told us the key was “We would know what they thought when they thought it”. In the case of languages that were invented within the ARPA community, this was really difficult because there were so many ideas being discussed.

I think this is the case of Ada and Babbage. The question asked here probably can’t be answered accurately at this point. A vague answer for Babbage and Ada would be “Probably not a critical factor, and very likely not a factor at all”.

But she and Babbage should get full marks — as we give Zuse — for what they did do: very early insights and examples into many important parts central to what later became computer science.

Were the compilers of the first programming languages written in machine code?

I wonder if “machine code” here means “in on-or-off-bits” or “assembler”? And whether “programming languages” includes “assembler”?

In the case of a really bare machine with no reasonable way of using another computer to help, programmers would still write in something higher than “raw bits”, could be something like flowcharts on paper with arithmetic expressions, and then carefully hand translate this into a memory image of raw bits. This was tedious, but not considered out of bounds, because a lot of debugging was done by looking at a “core dump”: an image of memory, often in octal (base 8) numbers. So the programmers would have in their mind just what the CPU was actually chewing on and making.

Also, occasionally one could debug on the machine (the Air Force allowed a programmer about 3 minutes a day and one had to work through a machine operator who could show a word in memory and flip switches to change it). The console switches were also used to dry start a computer by inputing a “boot loader”, usually for reading cards or paper tape. The CDC 6600 had a convenient panel of ordinary switches that one could use to set up a program of about 16 or so instructions to get things started.

The first interactive debuggers — via teletypes or flexowriters — appeared on Whirlwind at MIT and its offshoots at Lincoln Labs in the early to late 50s.

Assembly languages appeared quickly for every computer. Most of these had macro “expanders”, and some of these were quite elaborate — for example IBM Autocoder for the 1400 series had “tailored macros”: on each line of the macro you could write conditions on the parameters that would determine whether or not that line would be included in the expansion (thus optimization at assembly time). On a 1401 with e.g. just 2k 6-bit bytes of memory, this was a great help.

Given the tiny logical size — a few thousands of bytes of memory — and very slow speeds — fixed point addition times of ~ 6 microseconds or more (early serial machines like EDSAC had an addition time of 1.5ms!) — of the computers in the 50s and 60s, many programs were hand-crafted in “overlays” (no virtual memory or memory mapping). One way to think of this is that the workable memory of the computer was on the secondary storage devices such as tapes and slow disks or drums, and the tiny still slow core memory was used like a cache.

Some of the early computers used a drum as a primary memory — e.g. IBM 650, Burroughs 205 — and perhaps tapes or slower drum (or nothing) as a secondary memory).

A concrete example from the 50s is the famous IBM 704 mainframe ca. 1954, the first commercial computer with floating point arithmetic, and an ancestor of many subsequent hardware and software systems. It had vacuum tube logic, could do about 12,000 floating point additions per second, and had a primary memory of about 18KBytes (4K 36 bit words) of magnetic core memory.

image

The IBM 704 Console and some of the CPU

Its advent in 1954 prompted John Backus to propose making for it an efficient high-level language with an optimizing compiler — it would be a first, and he decided to call it FORTRAN (for FORmula TRANSlator).

The most interesting thing about FORTRAN was the great effort put into optimizing the semi-higher level code so that it could come close to competing with hand written assembly code in both size and speed. The bet by IBM for this large risky project was that just being able to handle simple arithmetic formulas and a few other things — like simple indexing usually in loops — would promote adoption if the size and performance of compiled code was good enough.

Here is an excerpt from a letter about the first tests of FORTRAN in April 1957 by B.G. Oldfield:

Our average compiling time is approximately 15 FORTRAN statements per minute and using an expansion ratio of 7 this means that 105 machine language instructions per minute can be compiled. Considering the addition of subroutines and input and output, compiling time will normally require two to three times as much 704 time as an equivalent assembly.

You can see that this is right on the edge of acceptability in a normal shop of one small, slow, but very expensive, computer that had been purchased to run jobs, not assemblies or compilations. The theory back then was the programmers were cheap and their job was to do the optimizations and debugging offline.

This was even more acute in my 1401 shop in the Air Force ca 1962. The FORTRAN compiler for our 8K-byte 1401 took literally about 100 passes — writing back and forth to tapes — to compile a FORTRAN program that would fit. So it — and COBOL — which had also appeared were essentially only curiosities.

However, again from Oldfield:

I would like to summarize by saying that our experience to date indicates that FORTRAN will 1) be very easy to learn, 2) quicker to code by a factor of 4 or 5, 3) have fewer coding errors (approximately 1 per 150 ), 4) will reduce the overall cost of a written instruction by a factor of approximately 3 or 4, 5) will produce very efficient object programs, 6) should handle at least 75 percent of our problems.

From our experience to date, I am convinced that FORTRAN is a very significant step forward in program preparation for the 704 and will prove to be a very satisfactory programming system.

In other words, FORTRAN was a great step forward.

An exception to long compile times — which is out of the scope of this answer — was the Algol 58 compiler for our other machine: the Burroughs B220 (also with an odd architecture by today’s standards). This was the first “pass-and-half” compiler and it was quite efficient, but only so-so regarding optimization (the B220 was larger so less optimized code could sometimes fit). One of the several motivations for this design was the very-very slow tape drives of the 220: gazillions of passes were out of the question. Two of the famous names who worked on this early compiler were Bob Barton and Don Knuth.

An interesting sidelight is from Tony Hoare’s Turing Award Lecture (ca 1980) where he tells about his first big project ca 1961: to do a workable subset of Algol 60 on the rather tiny and slow Elliott 803 computer. He had been very taken by being able to use recursion (and could see that this would make his non-implemented idea for “Quicksort” really easy to do). This also motivated his approach to use top-down recursive descent for the Algol compiler on the Elliott machine. He says:

“The compiler was structured as a collection of mutually recursive procedures, each capable of analyzing and translating a major syntactic unit of the language--a statement, an expression, a declaration, and so on. It was designed and documented in ALGOL60, and then coded into decimal machine code using an explicit stack for recursion.”

This is very like the earlier use of flowcharts and other higher-level forms as design guides for a program, and then hand-translating into machine code to get something that could run efficiently enough.

Good questions would be “What is the first compiler to be written in a higher level language?” and “What is the first compiler for a programming language to be written in itself?” I think I used to know this, but it’s drifted. Large parts of this problem were addressed by the first Lisp community at MIT in the early 60s, and this could be the answer. Very quickly, they used a machine coded Lisp to write LAP (Lisp Assembly Program), and there is mention of several Lisp compilers in the early documentation.

However, I think the first really serious example of a higher level language being used to the exclusion of all else was ESPOL on the Burroughs B5000 (ca 1962), which was a computer whose “machine code” would be called “byte-codes” today, and which had many other advanced features (some beyond current day designs). This whole systems design cannot be too highly praised.

The B5000 was designed to run programs from very high level languages like Algol, COBOL (and eventually Simula), and one big early decision was that there should only be higher level programming for the operating system and compilers. To this end they added a number of features to Algol to create ESPOL (Executive Systems Programming Language). The OS, the compilers, etc were all written in this language. Using the “1.5 pass” compilation and the efficiency of the machine, the B5000 could compile about 300 statements per second.

All this was possible so early on because this was one of the few instances in computing history where all parts of the end problem — including hardware, OS, languages, and deployable applications software, were all deeply considered in the overall designs of the whole system. Bob Barton was arguably the key figure for this project, but as with most great things, there were quite a few high talents involved. This whole systems approach to design was an inspiration for several subsequent projects, most notably those at Xerox Parc.

Who invented the computer?

There have been similar questions on Quora, and I’ve given a variety of answers. (Which I’ll try to find and reference here (would be really usable if Quora used that strange invention of “windows” to allow this kind of perusal on this page while writing an answer …).)

A computer is a process that deals with represented descriptions.

So you can make an “addition computer” by using two rulers … . When we make an algorithm — such as the greatest common divisor — we are making a computer (and this computer goes back to Euclid). In both cases we are doing some of the energetics for the computations.

Here, I think “computer” means “programmable, digital, stored program computer”. Historically, we need to separate early good ideas for these from the actual building of successful mechanisms. For example, both Babbage and Ada wrote algorithmic programs, and Babbage had Jacquard loom-like schemes for programming a mechanical computing device. Ada felt the universality that might be possible, and expressed the idea obliquely in several of her writings.

Turing proposed a very modern kind of mechanism for a computer, and especially a universal computer in 1936. Mathematically this was equivalent to earlier results by Goedel and Church, but the form that Turing chose was loaded with pragmatic whispers. Some of these were later heeded by John von Neumann.

I think the award to the person who made the earliest real mechanisms that touched more of the important ideas in computing than anyone should go to Konrad Zuse. He was not the only one operating in the early 40s, but the scope of his accomplishments is really impressive. Konrad Zuse - Wikipedia

image

Konrad Zuse and the Z3 programming digital computer from 1941

Zuse started to build computing systems in the last half of the 30s. Because of WWII his work was lost for a while.

Here’s a good chance to use the computer you are using right now, to find out about Bletchley Park, Colossus, Max Newman, and Tommy Flowers. This work during WWII in the UK was also suppressed for many years.

After WWII, there was interest in creating programmable computers in both the US and the UK. Maurice Wilkes of Cambridge University in the UK attended a workshop at Penn where Eckert, Mauchly, von Neumann, et al attended. For Wilkes there was a bit more talking than doing, so he decided to just make a machine like one they were talking about. He went back to Cambridge and did just that, to produce ca 1948–9 EDSAC, my vote for “something wonderful” as a “first” real programmable machine.*

image

Note the CRTs in the background that were used as displays for EDSAC.

There is an online emulation of EDSAC that allows the user to program it — I’ve occasionally asked students to write some programs on it. Edsac Simulator

image

The online EDSAC Emulator

Finally, it’s very worthwhile to look at the MIT Whirlwind I computer ca 1950–51 for not just a very modern parallel “mini”-computer (it required a few large rooms), but it was also the first “real-time” computer, had some of the earliest (maybe the first) interactive graphical displays, and (I think) the first interactive higher level programming language (“George” by Laning and Zwerler).

image

MIT Whirlwind console room ca early 50s.

---

This is not to omit “Baby” at Manchester, which was a small test machine that did get working a little before EDSAC. However, EDSAC was not a test computer, but the thing itself.

Why were computer monitors not rectangular to begin with?

I’m not sure I understand this question. Do you mean “as opposed to being circular” or “as opposed to being square”? If you clarify your question I might be able to answer.

In brief, many of the earliest computer displays were made from Cathode Ray Tubes originally used for radar (most were circular) or for oscilloscopes (most were circular).

image

World War II Radar Display

image

Whirlwind with one of the first interactive computer displays (early 50s)

P.S. The questioner — Leslie Anne — writes in a comment: I am thinking in terms of 90’s desktop PCs. That’s about as far as my knowledge/experience with any type of monitor goes. Thank you! :)

Movies are in rectangular “landscape mode” in part because this is a rough approximation to human two eyed field of view (as is an open book showing two pages, etc.) This is a little tricky because most of the acuity in an eye is in the small foveal region, and the eyes actually scan the scene and the brain puts the scans together to form a “psychological scene” that seems more expansive.

TV started with circular CRTs — but displaying a rectangular raster of scan lines — but their popularity led to both rectangular landscape screens of larger and larger size (which led to a switch to magnetic yoke beam control, etc.).

In the 1960s you could get a wide variety of both kinds of CRT in various shapes. For example, the FLEX Machine display used a Hewlett-Packard electrostatic CRT in landscape mode that was also rectangular (as did the initial RAND Tablet display a few years earlier).

image

Tom Ellis and his RAND Tablet using the large screen HP Electostatic Monitor

The Xerox Alto Display at PARC was a high quality magnetic yoke display that came from Ball Brothers (of the aerospace company Martin Marietta). For a variety of reasons we first went “portrait mode” (because of the Dynabook and because we wanted a full page deep). Not long after, still in the 1970s, we were able to make a double page sized display.

image

Our main complaint when the 1980s came around is that the display sizes of commercial personal computers were greatly reduced but kept landscape so that full pages couldn’t be displayed. Apple or one of its after market companies made a display that could be rotated from portrait to landscape (and would tell the computer so the image could be changed accordingly).

What is your hobby that has the best bang for the buck in happiness per dollar?

I’ve never thought about hobbies in terms of cost — they are all about passion and “work/play” for the sheer fun of it. And my main ones are about “life itself” and non-optional. So the “happiness” part from these is enormously high.

I don’t know how to choose between reading and playing music — but I spend quite a few hours every day doing both.

A book takes me about 1–5 hours to read, and will cost around $20-$40 usually (I mostly read non-fiction).

So: perhaps $10/hour? (Does this even mean anything?)

Right now I’m learning to play the lute at least 2 hours every day. The cost is mostly getting a great instrument (I’ve found that it’s worth really committing if you are going to learn an instrument, and then being willing to pay for a very good instrument — it makes a big positive difference, even in the beginning, both for achieving sound and for deep inspiration).

For music, figure 1000 hours each year for lute practicing, add in attending 2–4 concerts a month, hours singing in choirs, etc.

Even if we don’t amortize the lute over multiple years, it roughly breaks even with books at $10/hour, and the net is quite a bit less over multiple years.

Both of these seem like incredible bargains to me (and most certainly not at all deserving of money worries).

What is the oldest or most unique computer you have ever used?

The “oldest” -or- “most unique” computer I’ve ever used? Let’s take “programmable digital computer” to simplify away from various kinds of analog computers and the abacus, and “written a program for” instead of “used”.

The “oldest” has two candidates (a) the RAND “Johniac” which started working in 1953 (b) the Lincoln Labs TX-2 computer, which dates back to 1956. However I didn’t write programs for these until 1967. The first real programmable computer I wrote programs for was the IBM 1401 in Air Force Air Training Command ca 1961, followed shortly by also programming for ATC’s Burroughs 220 machine. And then a host of others.

“Most unique” is harder because so many of the early machine architectures would be considered “unique” by today’s pretty vanilla conventions. For example, looking from today back then, all of the computers I’ve mentioned would be considered “unique”, and they were all very different from each other.

For me the most challenging computer on which to write “good” machine code was most definitely the Control Data 6600 ca 1965 — it was a parallel machine with multiple operational units and a tiny cache and resequencer that the programmer had to have accurate visualizations of to make worthwhile programs.

And then there is microcoding. The most challenging there was the FLEX Machine that Ed Cheadle and I did ca 1967–9. This was because the microcode memories available then that could be used for a personal computer were so slow. The remedy was to have a very wide code word (120 bits wide on this machine!) and try to do as much as possible within these 120 bits before having to fetch another code word (yikes!). By contrast, the Xerox Parc Alto had a fast microcode memory with quite nice 32 bit microcode words so just a few little tricks were required.

What was the last breakthrough in computer programming?

There are lots of dimensions to “computer programming” and “programming languages”, so simple sorting won’t be satisfactory. But there is also a dilemma — I think — in that “something happened” in the 80s to what were major improvements every 11 years or so. In 1984 I was asked to write the lead article for the old Scientific American for a special issue on “Computer Software”, and included the following diagram:

image

The article noted with tongue in cheek that the transition from “a better old thing” to “almost a new thing” and then “a new thing” happened at about sun-spot intervals …

Lisp shows up both as a transitional language and as a “new thing” because more than most computer languages it appears as a kind of “material” as much as “a programming style”.

In 1984 I picked several other things that I thought were transitional. For example, the advent of spreadsheets brought up the possibility of massively parallel “swarm” programming, and I devoted some of my article to discussion about what Moore’s Law would allow. I thought the other promising direction would be “more meaning” in programming, and especially programming languages and systems that could do problem solving on the behalf of the larger goals for the programming. This was inspired by what Sketchpad had been able to do for numerically constrained relationships, and extended the idea to “requirements” and “specifications” becoming “runnable” and “debugable”. I said less about this in the article.

If we look at this today, 35 years later, it is hard to fit what happened into the ascendant qualitative levels I used for the first 35 years of programming. We can see that though Simula was one of the inspirations for Smalltalk, its transitional model remained as C++. The Planner to Prolog to Eurisko line petered out in programming, but did form a major part of a real “meaning based” system, Doug Lenat’s CYC. Interestingly, the “swarm object pattern matching” ideas did get reified by Joe Armstrong’s Erlang — he started independently with his own instincts and goals — and this language is quite active today, though perhaps not mainstream.

To try to be as brief as possible here, the surprising thing to me over the last 35 years has been the lack of “real engineering vigor” in “software engineering”. While other engineering fields were starting to use CAD and SIM to design and vet their designs — often on supercomputers — the shoemaker’s children wound up with no shoes with so much software development retaining poor and weak methods from the past and staying with simulated punched cards and non-live development.

And in the last 15–20 years, the real engineering fields have been increasingly able to add automated “FAB” to the CAD<->SIM process and derived from it. In software engineering, this would mean being able to automatically move from vetted designs on supercomputers to optimized systems that would work on commodity machines. One part of this optimization has been done — it does not use “meaning” but is more of a Moore’s Law device: namely the tracing JITers that are being used more and more. The seed of this technique was pioneered by the legendary Peter Deutsch for Smalltalk systems in the early 80s, and, today, has been taken far enough to be really useful along many dimensions.

As for programming itself, the rallying cry I’ve tried to put forth is: “It’s not BIG DATA, but BIG MEANING”. In other words, the next significant threshold that programming must achieve is for programs and programming systems to have a much deeper understanding of both what they are trying to do, and what they are actually doing. That this hasn’t happened in the last 35 years is a really unfortunate commentary on the lack of a maturation process for computing.

When Alan Kay talks about the size of the SAGE and the TX-2 computer, he uses the terms like "the size of the football field" and "the size of a building", what does it exactly mean?

Here’s an example of a SAGE four story concrete “blockhouse” with “power supply” in the first floor, the two large Q-7 vacuum tube computers on the second floor — they ran in tandem as one system, the “situation rooms” with enormous display projection equipment (Hughes and Eidophors), and on the fourth floor, the many display and light gun consoles for surveillance and tactics, all next to a football field.

image

Here’s a row of SAGE consoles with an Air Force staff sergeant using a light gun to point at the display:

image

About 30 of these blockhouses were built (most by the end of the 50s) and distributed around the country . The last one was decommissioned around 1981 or so.

Question for Quorans: where did they get the replacement vacuum tubes for the Q-7 computers the last 10 years or more?

What is it like to work at Disney as a programmer?

I wasn’t a programmer there for the five years I spent at Disney — but the company had invited me to bring my research group to be part of Walt Disney Imagineering, and — partly because I knew and admired some of the old time Imagineers — we wound up making the transition from Apple in late 1996.

The question is asked in the present tense — and about a showbiz company that has transformed itself several times since the Michael Eisner years that we were there. So I couldn’t say anything either about programming there or even working there.

I can say that in the 1996–2000 years “the creatives” at Disney were just great to work with, and that “the suits” were a mixed bag at best, with some of them being quite awful. The previous sentence was generally pretty true of movie studio companies, and is likely as true today.

Why is the guitar "easy to learn but difficult to master"?

The “difficult to master” part is common to all music because people who are deeply involved with music don’t just make it, they explore it, extend it, invent and reinvent it. Like sports, this pushes the boundaries towards human limits to the extent that even highly talented individuals have to do a lot of practice over many years to get near “mastery” (I don’t know of any really great musicians who would claim to have mastered their art).

Ken Hower — in these answers — says it well.

Many years ago I taught guitar to a wide range of students of different backgrounds, ages, motivations and psychological types. The trade-offs in the early stages are some literal fingertip pain, but with the close reach of a few relatively easy chord patterns, so that something that overlaps with most aspirations can be gotten to right away.

For younger children and some adults I suggested that they try ukulele first: the chord and note shapes are the same as the top four strings of a guitar, the touch is less painful, and much of the other musical learning of rhythms, listening, accompanying, etc. have an easier start-up ramp. This is especially true for both the baritone uke and the slightly larger 4 string “tenor” guitar.

The relatively easy “can do something early on” for guitar — compared to violin, trumpet, etc. — is also a bit of a barrier because what actually needs to be learned is just as onerous and requires just as much work and time as any other instrument (for the reasons in my first paragraph). I found — even with children — that this could be explained, and most of them could deal with the ideas: it’s going to hurt for a few weeks and try to just soldier through it, you will be able to do X in a month or two, but Y will take a few years. Every guitar player goes through this, and you can too.

I’m guessing that things are not so different almost 60 years later, but now we can add in keyboard synths as “can do something early on” that can soon provide barriers for really learning to play keyboards (and music).

What will Silicon Valley do once it runs out of Doug Engelbart's ideas?

I’d like to see.

However, we are in no danger at present. Silicon Valley has more or less ignored or abandoned many of Doug’s best ideas, so they are still around to be rediscovered and put forth.

Right now we are also not in any close prospect to seeing his ideas understood and used, which is really a shame.

Which is the best book for teaching programming to kids (11 to 16 years)?

The “best book” right now might not be good enough. I don’t know of a “really good” book for teaching programming to 11–16 years of age, but there might be one — I’d love to see what other Quorans think are good books.

A deep question here is “plasticity” with regard to learning a subject. We know that it is not a good idea to let children do deep learning in music — especially playing, especially classical — without helping them develop good technique (it can take years to undo bad technique — I know! — even with a lot of will power, and it usually doesn’t happen).

However, it is likely that many subjects will be plastic enough to allow several starts. On the other hand, “what is learned first” still often rules further development in various ways, especially when stress is present.

I don’t know of definitive studies regarding most subjects, including programming. There is a lot of casual and anecdotal evidence that many programmers fasten strongly to the first styles of programming they learn, and not only try to use these styles thereafter, but also oppose learning new styles if they conflict with their Ur-style.

If so, then most of the ways to teach programming — in general, and especially to children — are likely disastrous (as with both math and science, etc.)

Age 11/12 used to be considered the “model airplane” age/development stage (back when model airplanes were built from very discrete parts (often cut from balsa wood with razor blades)). It was also the “Tesla coil” and “ham radio” stage.

This is because new abilities for planning and organized doing start to show up at this developmental level — and quite a bit more fine muscle learning and control.

In our experience over the last 50 years or so, this is also a great age to really get into programming.

A key property of “model airplane”, “Tesla coil”, “ham radio” is that they are all reasonable sized systems, so that making them simultaneously involves “nuts and bolts” and seeing how a whole system is designed, organized and built.

To me this is absolutely critical.

We can also look at reading and writing with a similar eye, and we’ll see that a key here is literature — that is: contact with well designed things made by others that engender both pleasure and learning larger scale ideas and organizations. Trying to teach reading and writing outside of the context of ideas and examples and usage is not just a bad idea, it’s kind of crazy. (This does not prevent schools, etc. from being stupidly ham-handed about many subjects!)

And this is where most instruction in programming falls down badly. Much of it combines “ideas that are too simple to scale well” and “absence of larger organizations that show what the real deal is”.

This connects with a recent answer I gave to: Shouldn’t object-oriented “design” be studied before object-oriented “programming”? I’m sick and tired of these useless toy examples in these programming books. Why shouldn’t we learn to design first and then implement?

So: we need big ideas (could be from math, science, computing, etc.), we need live working examples of the ideas, that also contain ideas, that can be safely taken apart, explored, added to in meaningful ways, etc. We need a language and system that is “graceful” about being a substrate for all these. We need helpers — both human and system — than can provide insights when needed, etc.

Why not?

What’s the difference between applicative, recursive, and imperative programming?

Programming is providing a mechanism with what’s needed to achieve a goal we have.

Sometimes the mechanism can be dealt with very directly. Sometimes things will work out better if we first get the mechanism to act like a more convenient mechanism that we can then provide “what’s needed to achieve a goal we have”.

In the earliest computers, and at least well through the 60s, the code written by a programmer were called “instructions” (and often “orders”). The programmer was ordering the computer to do this and that, and the computer processes usually did the same to “data”. This was rather like cooking: with recipes, a cook, ingredients and tools. All what would be called “imperative”. The simplest example is for a store instruction to a location in memory to be able to obliterate what was there and replace it with something else.

Some realized that this didn’t scale well at all, and one of the early machines that could protect against willy nilly imperatives was the Burroughs B5000, much of which was the creation of Bob Barton.

Another idea which appeared early was that of “subprocedure”/”function” as bodies of code that expressed useful relationships, and would often employ parameters. The distinction between the two was that “function” should be like a mathematical function (a) it would create a value, so the function could be used where a value would be used, and (b) ideally, for the same inputs it would always produce the same output.

This led to the question of whether you could/should call such a subprogram while one was already executing it (e.g. FORTRAN couldn’t). But there were any number of problems whose parts resembled the whole (they were “fractal” in one sense or another), and the idea gained ground. Algol 60 was to be a kind of multipurpose programming language — so you could be both “rather imperative” and “rather functional” in it — and the subprograms could be called “recursively” from within themselves. On most computers of the day this required software to simulate the hardware that would be required (the B5000 had these mechanisms directly in the hardware).

I’m describing in this fashion to make the point that you could do “recursive programming” and “imperative programming” in the same subprogram in a language like Algol 60.

Earlier, in the 50s, it was realized — especially by John McCarthy — that many attractive parts of basic relational mathematics could be shaped into ways to program computers that could be more convenient, less error prone, more amenable to “thought and proof” etc. On the machines of the day, this required the simulation of a “more mathematical” machine that could handle ideas such as functions operating on functions (“application”) etc.

In McCarthy’s way of thinking, you could make something like “sets” (really “collections”) by defining them as “a something” possibly followed by a “collection” — John did this in a clever way by making his base definition a “pair” of one thing hooked to another, and used this to get everything else. This is a “recursive definition” of a sequence.

His definitional scheme was extended to thinking about how one could define the language compactly in itself, and this led to a nice representation of functions themselves and interesting questions about what it might mean to use them as parameters (for example, if a function had a free (global) variable, what would it mean to pass the function as a parameter to be used eventually in a different context.

The treatment of functions as “values” that could be passed around shows up clearly in John’s definition of “apply” in Lisp, a function that would “apply” a functional value to a list of parameters (which could themselves be functions, etc.), to create a new environment (by binding parameters to functional variables), so that the meaning of the function could be found via “eval”, and eventually a result would be found..

McCarthy’s ideas were so neat and clean that admirers — like Christopher Strachey and Peter Landin — looked much harder at ways to unify the simpler imperative schemes in the much safer much more powerful applicative frameworks. (One of the best of these is just a bit too far out of the scope of this note.)

Shouldn’t object-oriented “design” be studied before object-oriented “programming”? I’m sick and tired of these useless toy examples in these programming books. Why shouldn’t we learn to design first and then implement?

I like Marvin Minsky’s solution to the problem of teaching programming (in this case teaching Lisp to grad students in the 60s). He would give them the previous year’s PhD theses and ask them to find how the thesis did something, and then make a change to it. Sometimes this would be solo and sometimes with several grad students.

Similarly, in the early days of Unix at Bell Labs (when it was much more like its original philosophy), you would learn to program in both C and a variety of mini-languages influenced by SNOBOL by examining working systems and making changes to them.

And at Parc with Smalltalk …

In all these cases you are writing small programs to begin with, but in the context of more sophisticated designs. This is very much like learning structural carpentry in the context of building a house — the apprentices get small tasks initially but can see how the whole house is being organized, and they get to see a variety of design principles.

Both the Lisp and Smalltalk situations were dealing with live incrementally improved systems, and so were especially amenable to this kind of learning (and Unix was “semi-live”, in that the Unix processes were live and the C programming was “a bit less than live”.

Does Alan Kay think about how research into human biology and medicine is conducted?

I’m not at all sure about the center of this question. Perhaps the questioner can clarify in a comment …. ?

At least at a shallow level, I would say “yes, I do”. However, I’m sure biologists consider me “lapsed” at this point (I certainly do, though I try to keep up via reading). I also occasionally visit bio-research centers — like Janelia Labs — and seem to be able to still hold up my end of a conversation with molecular and cellular biologists (though just how polite they are being I can’t tell).

I think I’m less up to speed with research in “human biology and medicine”. For example, I’m aware of a number of “pretty good” simulation models for cellular level organisms, but am not aware of good simulation models of human systems (for example, the human endocrine system is quite complex, and the immune system even more so). Another area that I’ve grazed but don’t feel at all definitive about is the human microbe ecology (of some 10,000 species and counting) that we are the hosts for, and for which we still have very little idea of the function of more than a few species of the 90 trillion cells without our DNA that inhabit us.

I’m a bit of a hypocrite (or at least a do nothing) with regard to using animals for testing. I don’t like it at all for many reasons, but I’ve never been part of a protest or signed a petition. The most I did was to refuse to sacrifice animals while I was a student. I have a similar double standard with regard to meat — I eat it, but I grew up on a farm and got to see the whole process, and didn’t think it was good. (I just don’t think we can justify that kind of arrogance and expedience. And nowadays, it is quite possible to raise “meat without a brain attached” and we should do so. This would also deal with the greenhouse gases associated with meat animals (including chickens)).

I’m a scientist — even if somewhat lapsed — and so simultaneously love the knowledge about all manner of things that science has found and is finding. A large part of my lifelong interest in “cultural changing education” is from my dismay at how many humans find the weapons potential in any inventions that increase power. This is a disastrous route, and we are dangerously far along it.

What lessons were learned in aspiring towards the DynaBook, and have any of its original goals become dated?

A big subject over many years (the Dynabook idea was formed and fleshed out starting in 1968 with a cartoon I drew and a cardboard model I made — so more than 50 years ago now). The first public paper I wrote about this was in 1972.

image

The large idea that motivated the conception of a highly portable tablet computer with a touch-sensitive flat-screen display with stylus and wireless networking was catalyzed by seeing how Seymour Papert was helping children learn to do deep forms of real math by inventing a programming language — Logo — aimed to match up with the kind of thinking they can do at various developmental levels.

One of the ways this translated to me was that an interactive computer could be thought of like an active book. A larger idea — a “medium” — than the tool-based and “vehicle”-based metaphors of the 60s. And this brought forth all the connotations of literacy and civilization inducing changes of thinking brought by writing and the printing press. This idea fit very well into the ARPA ideas of inventing “interactive intellectual amplifiers pervasively networked worldwide” and widened it out to thinking of the computer as a “meta-medium” and that in any considerations of a new “literacy” whatever it was had to most strongly include children.

Many of the ideas for what a Dynabook should be able to do — and how — were adapted from the advances that the ARPA community had made from the early 60s to 1968. I thought that a language with the flexibility of Logo that was aimed at children in a similar way — and also included object-orientation for several reasons, including simulations; and that had some of the facilities of PLANNER to allow the kids to make systems that could reason — was called for.

Pedagogically I wanted to make “a world” that had the environmental properties of a Montessori school (with some echoes of Dewey also), and that was based around the ideas of Jerome Bruner and Seymour Papert. My main aim was to help children learn “science as both sciences and humanities”, so it was a bit wider than aiming at “math” (but included math). One way to think about a Dynabook is that it is mainly centered about all aspects of user interface design, especially for children, and not just about how to access, learn and use a computer but how to access, learn and use ideas.

One lesson learned was that Montessori was “even more right” than I thought back then. Her approach was very rich and very tuned to what human beings — and children — actually are. Her deep emphasis on design of environment as a living version of the epistemological stances she wanted the children to absorb — and especially “designing the adults” in this environment — have really stood the test of time. In the late 60s, we hoped that a Montessori environment could be made within the computer itself, and that it would not be necessary to do the extensive finding and training that was required to create a Montessori teacher.

After 50 years, I think that she was more right than we were. One of many perspectives on her approach was that she wanted to help many more children become real autodidacts. We on the other hand generally assumed that more children would become autodidactic just given a better environment. But a lot more is needed, and much of the motivation many children need for deep learning is social and especially from adults, that doesn’t fit well into media, even with much better AI than we have today.

But, there is a chance that adults and peers can be motivators as kind of partners with computer media that is yet to be invented. If so, that would break a huge log jam today.

I don’t think the original goals of “helping children grow up to think much better than most adults today” have become dated. The last 50 years only provide more evidence for how desperately this is needed — and the next 50 years could bring the enormous penalties for not taking this seriously.

I did allude slightly to the “cognitive sugar water” problems of making a new wide-spread networked portable public medium, but assumed that we could convince educators to include warnings and sensibilities about all media — writing, television, dynamic media, etc. — as part of a new curriculum for the 20th and 21st centuries. They didn’t get it, are really suffering the consequences of this (as is the country and world), and it’s not clear that they yet understand what has been and is happening.

As we progressed with experiments and user interface design in the 70s, we made up many slogans as mantras to help simplify the very complex business of designing for human beings. One was “The end-users are not like us”. Another was “Doing with Images makes Symbols”. And so forth.

Something that I think we understood at the time, but didn’t put a lot of effort into was the difference between “instrumental reasoning” and “larger scale thinking”. Humans have the former baked into them by their genetics but not the latter — so we should have made “other than instrumental” a larger part of our approach. Similarly, we have “coping” baked into us by genetics, but not “progress” (the latter was almost an invention — perhaps as late as the 18th century — to even make the idea visible).

There’s lots more that could be said (history is difficult because the past is so detailed), but I can sum up an answer to the question by saying: We learned many lessons — most of which were about human nature and how most people react to deep learning situations — and I think the original goals are still good, but could use a few more for better balance.

Will technological advancement continue as quickly as it has from the 2000s to now?

The keyword here is “advancement”. I think of the most important use of this in phrases such as “advancing the human condition”, etc.

If we take the trouble to try to understand our own species — via anthropology and its spinoffs — then we can see that quite a bit of the technology over the past hundred or more years could be considered “inverse vandalism”, i.e. making things just because you can (consider that every one of the 100s of 1000s of items in a flea market once was cared for so much that effort was put forth to produce — usually mass produce — them).

More critically have been the invented communications technologies. Writing changed thinking, and the printing press created enough people with changed thinking to change societies. I think this was a good change, but the Catholic Church and other such systems would not agree.

Electric and electronic communications systems changed latency of information transmission, but much of it also allowed oral and visual modes of thought to start replacing many of the literate modes that had grown up around writing and printing.

I think this replacement is bad, but marketing people, business in general, and authoritarian wannabes, would not agree.

A better ploy would be to have public education keeping pace not just with technology, but with understanding of the species they are supposed to be helping to learn things. This has been an abysmal failure. All communications systems need to be not just learned, but to be treated as powerful (and hence also dangerous) tools that will affect us as we try to use them (as Thoreau said “We become the tools of our tools”).

This means that society is losing the battle of understanding what is happening to us. Ideally, in our day and age, the increased understanding that science has of many things should be factored into our societal planning and learning. When this doesn’t happen we are letting both invention and the industrial revolution’s ability to make a lot of things very inexpensively and distribute them, to overwhelm us with great powers unbalanced by perspective with which to use them (or decide not to).

So, I think that technology will progress — because it depends on what we understand about the physical world — and that humanity will regress — because we generally refuse to try to understand ourselves and how we need to learn to handle our new powers.

How can I improve my explanation sense?

Answering Quora questions for which you are willing to do the necessary work is great practice.

Has a robot ever smoked?

I think the robot shown at the 1939 world’s fair (“Elecktro”?) smoked as part of the act.

https://www.youtube.com/watch?v=AuyTRbj8QSA

Yep.

What do some not understand or know about the difference between an amateur and a professional? Has this ever affected you personally?

I’ve never liked these terms (and have been guilty of using them myself).

The two main definitions of “amateur” are “not paid” and “not good”. “Professional” means “paid” and generally connotes “good”.

But the French word from which the English term is derived means “lover of”.

We get:

lover — paid —— ability

no ——— no ———— poor

no ——— no ———— good

*no ——— yes ———— poor

no ——— yes ———— good

yes ——— no ———— poor

yes ——— no ———— good

yes ——— yes ———— poor

yes ——— yes ———— good

Most of these are useful, and some of them could use special terms. The * flags one category that I don’t like to see in any pursuit. Both of the italicized ones are admirable (and they do both exist thank goodness!)

How do I modulate from E major to B flat major?

Lots of ways — let’s see if we can get answerers to contribute some of their ideas.

I take “modulate” here to mean winding up in the key of Bb major and feeling that as a key center (as opposed to just getting deftly to the Bb major chord).

It’s worth trying to figure out ways from scratch before looking to see how e.g. Schubert did it (he was a great sleight of hand modulator).

There is also a very useful little book that has 1 bar, 2 bar, etc., modulation schemes for most possibilities.

A good cadence is V-I, and the V chord (or V7 chord) for Bb is F | F7. If we can get to this, then we can really cement the key change feeling to Bb.

It’s probably too brute force to just go from E to F.

C7 is the V7 chord for F. So how could we get there?

It shares a note with E, and going to the C while holding the E is a nice change. If we look at the voice leading, the G# would go to G, and the B would go to Bb or to C or both. This change is known to work well.

More subtle would be to go from E to C#m (its relative minor, which also has the E note), and still holding this note, go to the C7, and then onto F7 and to Bb.

A faster route would be to do E, F#m (this is the ii chord in E), then holding the A in it, go to F7 …

And so forth.

---

Hard to leave questions like this alone …

As Jeremy Cotten pointed out in his comment, one of the fun things about this question is that the tritone interval between the two keys gives rise to both distances and similarities. For example E7–5 has the same notes as Bb7–5. Even better, their V7–5 chords also have the same notes: so B7–5 is also F7–5. A typical jazz cadence using this substitution would be “blah blah B7–5 F7–5 E”. You could try “blah blah B7–5 F7–5 Bb”, and it will be a little shaky but workable.

Another tritone approach is to listen to how they are used in Scheherazade, and note that if you pop between E and Bb with the same melodic motif in each (one of these in Scheherazade is E B E | Bb F Bb | E B E | etc. … ) you soon destroy E as the tonal center, and this could allow you to just take off from one of Bb landings …

Another route to finding modulation paths is to look at “voice leading” possibilities. For example, writing a melody from E up to Bb could go like E F# G# A Bb, or E F# G A Bb. Try harmonizing these. Or you can write a descending bassline: E D# C# C Bb — and harmonize it.

You don’t have to voicelead to roots (though a nice bass line can be very convincing). A fun game is to get from a chord to a next chord by just changing one note at a time no more than two semitones. You can find ways to get from E G# B to Bb D F.

If you are going to try voice leadings, you could first write both bass and soprano lines until you get something that sounds good, then see what the inner voices might be (a lot of very old music used this + some rules to help).

You can try other cadences. For example you could try IV to I. This would be Eb to Bb, so how can you get to Eb from E? (Lots of ways, try some). Using the above ideas you could even do E E7 E7–5 Eb

I quite love the iv min 6 to I (it is a slight of hand one). You could notice that Eb minor in tempered tuning is the same as D# minor, and this is the iii chord of B. D# minor is also the relative minor of F# (which is the V of B, and the V of V of E).

Just to pick an example: I think the first section of 17 bars in Faure’s Requiem is cosmically sublime. His progression over the final 5 bars of this is:

Eb/Eb F7/Eb | Gb/Db Db7/Db | F/C | A7/C# Dm6/D | A

Notice that this is the modulation that the question asked for, given as a “path to transcendence” by a composer whose brow was touched by angels.

image

It’s worth noting the voice leading of the top and bottom lines. The target in both cases is the A. The top line is reaching for it. The bottom line is in contrary motion until the top line reaches the A, then the bottom line also reaches upwards ending with an “Amen” 4th interval. The altos get to do the magical 6th. The tenor line is also worth noting. The dynamics taper to a hush. The whole effect couldn’t be more graceful and awe inspiring.

I can’t resist pointing out one more wonderful idea here. In the 3rd bar from the end there is an F/C, and then a first inversion A7/C# which really “wants to” go to a D chord of some kind. That an A major will be the final target almost immediately is completely hidden. I used the word “graceful” above, and I think this is the epitome of grace in more ways than one.

Also take a look at Daniel Michalak’s comment below for more illumination about the voice leading.

Transpose these five bars up a semitone, and it will go from E major to Bb Major as the question requested.

So: there’s a combination of existing lore and examples, plus a variety of music theories (not just tonal harmony) that you can use, plus noodling a lot, all of which can be combined to good purpose.

There are lots of perspectives on this — my personal opinion is that “voice leading is the master principle” and thinking along these lines with some of harmonic theory whispering will turn up lots of beautiful pathways.

How was the Xerox Alto done in only 3 months?

Basically: Chuck Thacker was simply amazing.

I recounted some of the history of the Alto in “The Early History of Smalltalk”, and both Butler Lampson and Chuck also wrote histories of that time that included accounts of the Alto development (all can be found via the ACM, and I think they are online also).

It was part of the ARPA research M.O. to “compute in the future rather than in the present” by either using existing supercomputers, or by making a kind of supercomputer. There was enough science and engineering substantiation to allow a reasonable faith that the supercomputers of the present would be the commodity computers of the future. Beside needing to try out ideas and designs in many areas, there was also the brute fact that inventing and implementing new software ideas took quite a while to do — with really big dreams, a 10–15 year horizon was needed.

I’ve written elsewhere in Quora a few things about the Alto, so I won’t repeat here.

Was the Xerox Alto a prototype or a finished product?

To what extent did the Xerox PARC Alto inspire the Apple Macintosh?

I think I might have also written about “MAXC”, the emulated PDP-10 that was the first project done by the computer researchers at Parc (in 1971). Xerox wouldn’t allow us to buy one (it was in the process of becoming the standard TSS computer on the ARPAnet) — instead they wanted us to use a Xerox Data Systems computer — so everyone almost walked out right at the beginning.

We had in Butler Lampson very likely the best OS designer in the world — so we could do our own OS on the XDS machine — but even really good people need 3 or so years to do a good and reliable OS, so that was out. Then the discussion turned to “but we could make an emulated PDP-10 in less than a year” — and — “we could use and learn about the new VLSI DRAM chips from Intel”, etc….

Every thing about MAXC (Multiple Access Xerox Computer) was quite different than the DEC PDP-10 (including being much more reliable). This project was headed by Chuck Thacker, and required most of the computer researcher resources then at Parc. The result — in just 9 months — was both a successfully running mainframe machine plus technological leverage, especially with regard to the new DRAM.

In parallel, the idea was floating that we would do some kind of a personal computer look-ahead to have an “Interim Dynabook”, to explore serious UIs and local applications, to try distributed computing using local area networks, etc. We decided that it would be worth sacrificing RAM to have a dynamic bitmap display (because then we could show “anything”). We wanted “printing quality fonts” and bitmap painting. I wanted dynamic 2.5D animation and 12 polytimbral voices of music synthesis, etc. We built HW in early 1972 that allowed simulation of all five of these aims. This gave us a pretty crisp understanding of what kind of HW power would be required.

We knew that we wanted the microcode of the computers to be able to efficiently emulate byte-codes as fast as main memory could cycle — because we were in the process of inventing a variety of new languages that would be implemented this way (Smalltalk, Mesa, etc.). The parties involved decided in the summer of 1972 on a scheme as to how that could be done nicely.

We had all agreed that we would always “engineer for 100 users”, so if we did a personal machine, we had to be able to manufacture 100 of them (in fact, close to 2000 of them were eventually made).

The actual project wound up being started — in late Nov 1972 — secretly because of some politics, and — as with a few other projects at Parc — as the result of a bet: Chuck said that a futuristic computer could be done “in three months” and a Xerox exec bet him a case of wine that it couldn’t be done.

Even though the DRAM was now understood, and the prior simulations established figures of merit for everything, there was still a lot of work to be done in just 3 months! (For example: designing and building and debugging the first machine!)

The Alto was put together partly using wire-wrap technology in one 19 inch rack. Most of the boards were DRAM (128K bytes organized as 16 bit words), and a very fast couple of processor boards which held the 16 program counter zero-overhead-task-switched microcoded CPU and register file. The backplane was also wire-wrapped.

Most of the detailed design and much of the building was done by Chuck*. He had two technicians to help (Larry Clark and Mike Overton). Ed McCreight (another of the truly amazing multitalent computerists at Parc, designed the disk interface).

Three months later: voila! We used to say: “Chuck just threw the parts at the wall and they fell down Alto!”

image

From left to right: Larry Clark, Chuck Thacker, moi, Ed McCreight, and on the screen the Alto’s first image, a bit-map painting of the Cookie Monster on “Bilbo”, the first Alto. (Mike Overton, not pictured)

image

A closer look at Bilbo at birth, the Cookie Monster, and the board monster

image

The Alto could easily emulate 12 high quality polytimbral voices in real-time (and 8 FM voices), and could handle two keyboards and pedals organ console input.

image

The Alto could animate about 120 square inches of bit-map images at 10 frames per second in 2.5D.

image

The happy user in his office at Parc.

image

Chuck with his 2009 Turing Award bowl. He is one of those special people you can’t praise too highly in all respects of his life.

We were all shocked when Chuck went away and left us in 2017. It is a vast understatement to say that we miss him greatly

---

* From the Alto Hardware Manual:

“The Alto was originally designed by Charles P. Thacker and Edward M. McCreight, and was based on requirements and ideas contributed by Alan Kay, Butler Lampson, and other members of PARC’s Computer Sciences Laboratory and Systems Science Laboratory”

In object-oriented programming, why is it bad practice to make data members public when the get() & set() public members modify it anyway?

If you use setters, you are not really working with objects, but glorified data structures. This use of objects as “Abstract Data Types” is not in the spirit of Object Oriented Programming and Design. This misuse is one of the biggest misconceptions about what OOP is all about — and it also removes much of the power of organizing modules whose contents are sealed off from the outside.

One way to think about “objects” is that each is an idea that holds possibly useful behaviors that can be requested (not ordered). So they are like servers with differential privileges. Since there are almost always many more ways to accomplish concepts than there are concepts — think of the idea of “sorting” vs the many ways to do sorting — it makes great sense to separate the “language of concepts/behaviors” from the ways to accomplish the concepts/behaviors.

This allows many kinds of scaling and reformulations to be done both safely, and while the larger system is running (hint: your systems design is poor if you have to stop it to fix it or change it).

Another way to look at this is that “data” in most imperative and static senses of the word needs to go away — it is much too fragile and exposed.

You can request services, and some of these might be as simple as asking for a particular value. If the server is representing a “person” then you might request “date of birth” and get a date object as a reply. If you request “age”, you would expect to get a duration object back. In the first case the date object might be held directly internally (probably not), and in the second case, the duration object is the result of an on the fly calculation (and which might be “continuous”).

A more sophisticated use of objects would be to have a “goal pool” of things the larger system needs done, and the server objects could be constantly looking at the goal pool to find things to do. (This is a kind of “Publish and Subscribe” — or Linda — type process.)

Etc.

Why are guitars tuned with an open B when a C seems more logical?

An interesting range of answers in the previous 9. It’s worth going back in time before the guitar to see how lutes, vihuelas, gambas, etc were tuned.

Ans: all in 4ths except for one major 3rd somewhere — and as noted in some of the other answers: 4ths because long scale lengths make playing chords in 5ths tuning tough, and a major 3rd somewhere to collapse just enough to make more chords more doable and (perhaps) to wind up with a double octave on a bass note.

The Renaissance lutes and vihuelas with 6 strings (actually “courses” — a “course” is usually a double string) have a tuning that is like a modern guitar except with the major third a course lower. Typical tuning from low pitch to high, with the major third between F and A:

G C F A D G

Around the end of the 16th century, more bass courses were added. These were usually tuned to help provide bass notes for particular pieces. But a very popular addition was a D a 4th below the low G:

D G C F A D G

Note that this actually “provides a guitar” tuning, reading from left to right plus an extra top string. Lute players who are also guitar players (or vice versa like me) often take advantage of this to easily play guitar music in this tuning (and note, not too far from actual guitar pitch).

Further bass courses were added to the Renaissance lute (and there were alternate tunings also, etc.)

Around 1650, the French invented the “Baroque Lute” with a very different tuning on top: basically D minor chords:

(Top 6 courses from low to high): A D F A D F

The logic here was to allow dissonances and close chord voicings. There is an enormous amount of lute literature written for the Renaissance and Baroque tunings (estimate is about 30,000 pieces). Almost all of it is written in tabulature (so you need a particular tuning to be able to read and play).

Baroque lutes had lots of bass courses (up to 8 more) and these were tuned in a scale. So a typical tuning from low to high would be:

F G A Bb C D E F G A D F A D F

The archlute would often have the same bass courses scheme, but with the top 6 courses in Renaissance tuning (this is a very useful wide ranging instrument).

The guitar came along in fits and starts, with a 4 course version in the 16th century, and in Baroque times, 5 courses. These often were tuned more like ukeleles, but sometimes with a top course an octave lower. Again, 4ths with one major 3rd.

There is a repertoire for these, but a lot of the use was to play chords to back up singing and other instruments. The guitar with 6 single strings started appearing around the end of the 18th century, and was in full use in the early 19th century. Many of these instruments were fitted with extra bass strings as with lutes. What we think of as the modern classical guitar happened around 1850 with a number of design changes by Torres: larger bodies, a different form of bracing, etc.

What does Alan Kay think of using either Squeak or Pharo as the first language for introducing programming?

Helping someone learn something is a deep UI situation, and it has to start with the first word in UI.

I think of the process as being quite similar to teaching a musical instrument (which I used to do), and the best thing one can do in the very beginning is to get them to do even the smallest “real playing” they can do. What this is depends both on who they are and the nature of the instrument. This is very similar to the “bounce-hit” ideas of Tim Gallwey’s “Inner Game of Tennis”. His point was that “beginning tennis” is mostly “chase the ball”, so the question is how can you get an absolute beginner to “hit the ball” 90% or more of them right off the bat.

In other words, you want to structure things so that the learner starts as a kind of “low intermediate” with actions that bypass the fumbling and fright of “a beginner”.

For 9 years old and up, a very good beginning is to use Etoys and the “Draw A Car and Drive It” first project and do it one-helper-to-one-learner. In about 20 minutes something quite fun gets done, and about 30 things are learned, about: objects, code, variables, loops, multiple processes, etc.

There are a lot of considerations in UI, and the UI of teaching and learning, but if you just wanted to pick one focus to help design, it should be “reduce cognitive load”. One of the ways to do this is to use the screen real estate to augment the short term memory of the beginner, and show them all the ingredients, and have room to work with the ingredients all in “an eyeful”.

The standard UIs for Squeak (and Pharo I think) are more organized for professional programmers who already know how to program and want to create systems of medium to large scale. The use of the screen is aimed in that direction much more than what I’d choose for beginners.

I think in almost all cases, most languages have weak interfaces for beginners. One of the top UI principles is “do not make end-users suffer because of your laziness or ineptness”, so some real work would be called for to make a decent beginning environment for beginners.

Bottom line is that I think there are enough actually important things that need to be gotten deeply when learning to program, that all other obstacles should be eliminated via making a pedagogical language for first use. It can be really powerful, but it needs to be in the mental space of the people who are trying to learn.

If you worked on pre-Unix time sharing systems such as ITS, Multics, TOPS-10, Tenex, or others, can you describe some of your experiences?

By accident, I entered the U of Utah Computer Science grad school in 1966, and found that it was one of the “ARPA Research Universities” (I had never heard of ARPA, etc).

Dave Evans had set up the department the year before when he moved back to Utah from having been head of the CS department at Berkeley (and of the ARPA research “Project Genie” there, which was aimed at making a very efficient time-sharing system on a relatively small computer (the SDS-930) by designing and making memory mapping and protection hardware, and a new kind of operating system to run it all.

Two of the luminaries there — both students at the time — were central to this design: Mel Pirtle (who was the main HW designer), and Butler Lampson (the main designer of the OS). Peter Deutsch was another key for the design and the implementation.

Project Genie turned out to be so successful that other ARPA projects wanted the machines also, and Bob Taylor — who was running ARPA-IPTO at the time — severely twisted the arms of SDS to make a commercial version of this machine with the Genie OS on it — the SDS 940.

And this was so successful that — a few years later using the same OS design — BBN did HW and an operating system for the DEC PDP-10 that became Tenex. Later (1971) the MAXC “fake PDP-10” at Parc was created to run Tenex. And so forth.

The SDS-940 machine was one of the main factors in the launch of the Time-Sharing industry. It is likely most famous and remembered for also being the machine on which Doug Engelbart and his group implemented the NLS system and showed its many impressive delights and approaches in “the mother of all demos” in Dec 1968. You can see this machine and OS at work in the online videos of the demo.

One could also dial into Engelbart’s 940 and use it via teletype, and at Utah we made quite a bit of use of this — as a grad student, I even taught a class using this system (Ed Catmull was one of the students).

Part of the reason teaching a class was so attractive was that one of the subsystems of Project Genie was CAL, an extremely efficient derivative of Joss (the first great end-user system that had been done at RAND in the early 60s). CAL was one of Butler’s many projects, and it was an incremental on the fly compiler line by line, so fairly hefty projects could be done in it.

The specs for the 940 should be a bit shocking today. The machine had a grand total of 192K bytes of main memory (as 24 bit words) and executed instructions at about 0.5 MIPS. But could run 20–30 users quite handily. It was “designed to crash” — meaning that most OS software was designed “to work”, and recovery from crashes could take quite a while. Butler’s design OTOH did just the opposite: it spent some resources trying to “clean the kitchen” as it worked, so that a crash would incur minimal discomfort and could be recovered in a few seconds most of the time.

Another really interesting wrinkle, is that Butler’s design rewarded programmers for predicting their working set ahead of a time quantum (and punished them severely if they missed (by running their process on the slow queue)). This made an enormous difference in efficiency because the OS could line up the next 3 or 4 users’ working set ahead of time and not need to do random paging from random page faults (etc.)

As a computer scientist, how much math do you actually use?

Mathematics is a plural because many ways to “represent ideas to allow reasoning about them” have been devised. That little “s” also implies that more can be invented as needed.

What this means is that sometimes one can use already devised “maths” to help in computing, and that sometimes one has to be able to invent new maths to help with computing.

I don’t think a person can be a real “computer scientist” (where we take the term “science” seriously) without being fluent in maths, including being fluent in being able to invent new maths when needed.

I suspect that a lot of the disappointing computing we see around us has been concocted by computerists who haven’t really gotten themselves invested deeply enough in the real field (to the extent that there is one).

What is, in your opinion, the best textbook for self-studying computer architecture and programming “from the bottom up” (where the bottom is the logic gates level)?

This is a great question! And I’ve periodically wanted to know the answer over the years — especially when I’ve wanted to teach “a whole systems” course.

I’m in London for a few months more and away from my main library, so I can’t list the books I’ve found in the past. I can say that none of them lived up to the corresponding book in molecular biology: “The Molecular Biology of the Cell” by Bruce Alberts, et al.

image

Over a 1000+ pages, this takes the reader from elementary chemistry all the way through how life works, in enough well written detail to be understandable.

There are still a few supplementary books that help a lot (for example: “Cell Biology By The Numbers” by Philips and Milo). I think this would be OK for the current question.

Since I haven’t looked for a few years, my plan here is to buy and read all the suggestions made by the other answerers, and hope that a few good answers turn up. I will then report on them.

What I’d like to see is a book that starts with the many ways to make logic gates — as Danny Hillis’ “The Pattern in the Stone” does — and then moves to abstractions — such as the “stick diagrams” in Mead and Conway’s “Introduction to VLSI Systems” to sketch out simple memories, addressing schemes, data paths, etc. for something like a simple RISC architecture. The control for this could be a simple microcode (and later more microcode could be added in a separate memory).

We want to get to software as quickly as possible so the hardware design here can be a lot of fun to choose just what will help SW without itself getting too tricky.

Then we could “bootstrap a bootstrapping system” — could be a simple virtual machine in which symbolic processing to make a language could be done. And this could be used to write an operating system kernel that is expressed as the basic level objects in a protected object-based system.

From here it would be easy to show how a modern UI and end-user tools could be made in a few thousand lines of code.

This gets at the essence of the “chain of being” without having to delve into the needless and enormous complications of modern hardware and software, most of which is quite superfluous to both understanding and even most practical usage.

---

(Edit on May 18th 2020) A few days ago I got asked questions like these by a CS department in the US, and told them that the Nissan & Schocken book couldn’t be too highly praised in the amount and quality of effort put forth and the high quality results they got.

And there is also the Patt & Patel book, which is also an excellent effort and result.

That said, today I think I would engage a faculty to rethink these books along several lines, and get their students to help — probably over several years of effort — to try to get to the next level of explanation.

This is because there is a difference between justly giving cosmic praise and really loving the result (e.g. you really can’t praise Simula or Lisp too highly, but the world of programming needs even better languages than these wonderful creations).

For example — compared to Nissan & Schocken — I would certainly have the ultimate SW targets be different and much higher level. I think there is enough experience and lore to really pull this off.

On the HW side, I (personally) would need a few months to think this through. Maybe there are two versions.

One would be the absolute minimum as far as architecture goes — but superfast with an enormous memory — to get SW going. This is the “tiniest Turing machine” idea.

The second would be an example of a minimal-but-practical architecture might mean today (today’s equivalent of Chuck Thacker’s Alto at Xerox Parc)

Why aren't there any books about the history of modern computer programming (like 1970 onwards)? Most books seem to delve really far back and only just touch on the present and future of programming.

I take your question to imply “good books” (there are plenty of pop culture books about microcomputers onwards).

The best book about Parc is actually about the ARPA research community started by JCR Licklider — “The Dream Machine” by Mitchell Waldrop — this has just been republished by Stripe Press. Parc was both an outgrowth and an integral part of the ARPA community, and this book is “pretty good” about providing a fairly good history and context for the larger endeavor. A “heroes journey” book about Parc is “Dealers of Lightning” by Michael Hiltzik. And there are others.

For histories about programming per se, let me recommend the ACM “History of Programming Languages” series (now with 3 volumes). The first covered “dawn of history” until about 1970 or so and was published in 1978. The second was published in 1996. Etc. The ACM selected the languages, and did a good job bulldosing the inventors of the languages to explain how they came about.

Some of the most interesting languages did not get covered (the ACM criteria was that they have been in use for more than 10 years) because they were exotic or were ignored for other reasons. However, Jean Samet looked at a lot of programming languages in her book.

Perhaps more to your point — which asks about “programming” — I don’t know of many books at all — or even papers — which deal with the history of e.g. “programming large systems”. There’s “Mythical Man-Month” by Fred Brooks about programming OS 360. I think Howard Sturgis wrote a paper about the trials and tribulations of CAL TSS (an OS for the CDC 6400 at Berkeley).

I have seen several books about the history of Unix (which started in the early 70s about the same time as Parc started up). I don’t know how good these are.

But, there’s Xinu (Xinu Is Not Unix) which is a small Unix-like OS that is completely explained in several books, including how to build it.

There’s the Smalltalk “Blue Book” by Goldberg and Robson, which includes a hefty section on how to make a Smalltalk from scratch.

What is the nearest modern equivalent to Bell Labs or Xerox Parc?

I take the question as being mostly about “similar process” rather than particular topics of focus.

I haven’t found one that has the same feel. The two today that are worth looking at more closely are Janelia Labs and the Max Planck Institute. I know the least about the latter, but have heard very good things about their process.

Most of the others mentioned in the other two replies so far are much more top down and “directed” than Xerox Parc was (and most of the much larger Bell Labs was).

Further side points to ponder are that some of the labs today which don’t have a similar process might very well have started with the same process. A small growing organization can add new and crazy people and groups. There is usually a size limit, and when that is reached, the only way you can continue “new and crazy” is to nudge the older groups out (this might be a bad idea). Also, there is the problem that the very process that jells a group to be highly cooperative, often establishes de facto leaders who might not be best for the next phase of “new and crazy”, and in any case, they start looking a lot like management even when they are trying to avoid it …

Why does technology tend toward complexity?

In talks, I’ve made the distinction between “Complex” and “Complicated”, where “Complexity” is an intrinsic property of an organization of “ingredients” to satisfy a goal, and “Complication” is extrinsic — mostly due to human poor design, execution, etc. (This is not an original idea, you will find other versions of this bifurcation.)

The aim of “architecture” and “organization” is to have the result be able to do more than the ingredients.

Note that — because of our nervous system — we can sometimes get “more” even when the ingredients don’t themselves interact. For example, we can play separate sine waves whose frequencies are integer multiples and perceive a complex timbre, even though the ingredients are completely linearly independent. Here the “complexity” is mostly within us (and for useful evolutionary reasons).

Most of the time, there will be interactions between the ingredients, often the communications paths will form feedback loops, and the results will be non-linear, often both literally and figuratively. The difficulties in predicting what such systems will do is also part of the measure of their complexity.

It’s worth noting that we plus our tools can often reduce the apparent complexity (if we have a real-time FFT handy, we can see much more about how separable the frequency ingredients of a sound might be …)

The real costs of “complications” in software are rarely incurred quickly, so these have been allowed to get out of hand in ways that real engineering in the physical world tries to deal with and squelch early in the game. The ease of linking and patching and storage capacity in SW allows a kind of “brinksmanship” that is appalling, and has opened the door to poorly prepared practitioners.

Bottom line is that “complexity” is what we want to be able to handle, and we will have more brain cells for this if we do everything possible to limit “complications”.

Is there an OOP approach that uses a "milieu", not an object graph, to send messages, similar to how my voice is carried through the air and hits people who happen to be in range, rather than being addressed to a specific person?

A good early event-driven system was the “Sol” programming language by Knuth and McNeely (ca early 60s).

You can take a look at a number of locational “particles and fields” object programming ideas.

For example, suggested in my 1984 Scientific American article, and used in the Playground language (which had “near” as a property) a few years later.

A nice version was in Mitchel Resnik’s StarLogo, that used “patches” as locational environments. This was later used in “Wandering Letters” and other particle and fields systems, and adapted for the Etoys “Kedama” particle system.

John Maloney’s new “GP” language (a wide spectrum up-design for Scratch) has location sensing.

Etc.

Does Alan Kay see any new ideas in computing?

“New” is not what I look for. “Ideas that make a qualitative difference over past techniques” are what I’d like to see.

Years ago, I’m fairly sure I was aware of pretty much everything regarding computing that was going on in the world. Today, I’m definitely not aware of everything, so it’s reasonably likely that if there was something really great being done somewhere that I wouldn’t know about it.

I would be most interested in learning about “qualitatively more expressive” programming that is more in line with top-level engineering practices of the CAD<->SIM->FAB systems found in serious engineering of large complex systems in the physical worlds of civil, electrical, automotive, aeronautical, biological, etc. engineering.

In the CAD<->SIM part I’d like to see the designs understandable at the level of visualizable semantic requirements and specifications that can be automatically simulated (on supercomputers if necessary) in real-time, and then safely optimized in various ways for many targets.

Isolating semantics in the CAD<->SIM part implies that what is represented here is a felicitous combination of “compact and understandable”.

The FAB-part pragmatics are very interesting in their own right, and besides efficiencies, should be able to deal with enormous scaling and various kinds of latencies and errors, etc.

The above would be the minimal visions and goals that I think systems designers within computing and software engineering should be aiming for.

I’m not aware of something like this being worked on at present, but these days this could be just because I haven’t come across it.

Are shallow inheritance hierarchies better?

In English we often say “This is like that except …” and most of the time this is a plus if there is a good mapping between the two ideas and the “except” doesn’t destroy the mapping. In inheritance we are in theory saying something safer: “This comes from that via …” where the initial mapping is a form of equality or parametric equality, but where the same care has to be taken with the “via”.

If there are no “guardians of meaning” in our system, then we can hack our way into something as messy as we were trying to fix, and perhaps messier in a new and pernicious way.

This is similar to using an assignment as “equality” — hugely useful — but if we don’t guard it in some way we can destroy the idea of “equality”.

Since the “spirit” of inheritance can be easily violated in most languages with a shallow hierarchy, my answer is that “shallowness” could be more helpful in understanding or avoiding a potential mess, but that’s all.

On the other hand, I like useful abstractions a lot, so we might better try to think about forms of “analogical descriptions” or “differential descriptions” that pay their way in clarity and ease of use. These issues are ontological, and it’s worth looking at what ontologists have been able to do with representing ideas, especially with enough constraints to preserve the original comparisons and derivations.

The first good attempt I’m aware of to do inheritance in comprehensive ways that might have some clarity was the “Flavors” system in the first versions of Lisp for the MIT Lisp Machine. This had “operations”/”relationals” that provided a bit of a logical and algebraic feel, especially for multiple inheritance. (And provided the opportunity for nightmares of obscurity if hacking brinksmanship were done — which turned out to be the case.)

One way to think about this is that one of the most important issues to try to control well is “dependencies”. I think that most of these need to be controlled semantically and automatically — so something like the idea of “set of support” and “truth maintenance” in several of the better “expert systems” systems in the past (such as ART) where changes would cause ripples through the system to make sure that all dependencies remained valid.

I wonder if there are any great examples these days in modern programming languages?

What are your thoughts on the free software movement?

I think it’s very similar to “free speech”. It’s important to allow, but the equivalents of yelling “fire” in a theater should be curtailed.

Another facet of this these days includes what you can do with biology on a kitchen table with about $10K-$20K worth of equipment.

Or what you can make with 3D printing that wasn’t at all easy to make previously.

Vi Hart said “We must ensure human wisdom exceeds human power”. We aren’t putting forth the enormous societal effort needed to do this.

The consequences of anti-social actions continues to get more and more life critical.

What makes for a great software demo?

Presentations are “theater”, and always about the audience.

The vast majority of human beings are “instrumental reasoners” (meaning that they mostly judge on how an idea or tool contributes to their current goals). Only a small percentage will be interested in the idea or tool as the possible start of new goals.

This means that showing “analogies” and “capabilities” are likely not to register as positive. Most people want to see more direct relationships to what they want to do and are trying to do.

Marketeers try to home in on and sell to what people want — where the “customer is always right”. Educators try to home in on what people actually need — and people often don’t want what they need — so the “customer” is often rarely right.

So if you are trying to educate rather than sell — and this is what “good theater” is usually about — then you have to put in a lot of effort to get people interested in ideas, and to get out of their current simple “heat seeking” behaviors and judgements.

McLuhan was famous for not trying to get an audience to agree with him: his goal in a talk was to just get them thinking at all! This is actually a good ploy (though I’ll confess that I usually have didactic aims beyond just waking an audience up).

The 1979 demo of the Alto and Smalltalk we gave to Steve Jobs worked to the point that it did because (a) he was looking for something better than what they had, and (b) every time he asked a question or wanted a feature, Dan Ingalls and Larry Tesler were able to make it happen in a few seconds right in front of him.

However, as he mentioned some years later, he missed thinking about why Dan and Larry could do all of those things in a few seconds, and why the system could do what it did in general. And he missed the Ethernet, etc. He got what he wanted, but he needed more.

I think this sums up what was — mostly — a pretty effective demo.

Does Sequoia Capital have enough cash from its investments to create a modern day Xerox PARC?

The old Don Valentine company that funded Atari, Apple, and many others …

The computer part of Xerox Parc was relatively small — Butler Lampson has estimated that the many major inventions were done by about 25 computer scientists (and this would imply another 25 in support). So about 50 people.

Let’s double that to deal with both the eventual size of the computer part of Parc (about 40 computer scientists), so let’s say about 100 people.

Back then the costs were different (e.g. the inflation in housing was another factor of 10 times regular inflation). Salaries were also a bit proportionately lower. But hardware costs were quite a bit higher — both in general, and also because Parc bought its way almost 15 years into the future by making its own personal supercomputers and anything else needed to “compute in the future rather than in the past”.

I’ve estimated that the yearly costs for something like what we had in today’s dollars were about $12M-$15M.

If we expanded this to $20M to $25M per year in our estimate here, we should be able to fund a lot of capability.

So a better way to pose your question might be: “Are there any Fortune 500 companies that couldn’t afford to set up a Xerox Parc?”

Or: “Are there any governments in the world that couldn’t afford to set up a Xerox Parc?”

Or even: “Are there any major universities that couldn’t afford to set up a Xerox Parc?”

I.e. If you restrict your head count to just the very best, and support them, and there’s a great vision they can feel, and they can work together and find their own goals and problems from the vision … wow!

Did early computing necessarily have to be in binary, or could there have been another completely different concept that would have evolved into the technology we enjoy today?

Babbage’s “Mill” (the CPU of his engines) — along with other arithmetical computers before and after — used gears, and worked in base 10. These schemes were “digital” — meaning they represented discrete values. Since the gears turned, the apparatus was “analog beneath”, and part of the apparatus had to ensure that the values were eventually quantized. (And consider the several roles of the escapement mechanism in a clock …)

There were also quite a few analog computers made over the years, especially towards the end of the 19th century and well into the 20th century that used mechanical or electrical or combinations to represent quantities plus or minus a bit of noise.

One of the many difficulties with noise is that it can often build up to the point where the quantity is no longer a reasonable approximation. Making a system that deals in discrete values helps a lot, because you can reset communications from each stage (one way to think about it is that quantizing a signal reshapes it into something standard).

So you could easily make a communications system that transmitted 10 voltages spaced widely apart enough so that which one was intended could be recognized by the receiver and reshaped to the standard value. You could make operational hardware to add, subtract, etc these. This would be “digital decimal”.

However, you could also take advantage of the fact that you only need 4 “digital binary” values to cover the values from 0 to 9. And 10 binary digits will represent up to roughly a 1000, and 20 to roughly a million.

So: if making “binary digits” allows much deeper engineering and ultimately simpler systems design, then it’s worth concentrating on a completely binary system.

And, we don’t have to give up on analog ideas either, because we can use sampling to represent any waveform at any desired accuracy (even far beyond the ability of any analog system to stabilize). This is what happens these days with sound recording and reproduction (if it is really done well, you get the best of both worlds).

How do you tune the 1st string of the guitar without a tuner?

An old trick, which may not work so well today, was to use a fluorescent light* as a reference.

They used to emit a nice 60Hz hum (actually one or two octaves above this depending on the fixture), and this is just slightly sharper than a Bb (the Bb is a little above 58Hz). You can tune one of the Bbs on your guitar to this (e.g. the 6th fret on the E string) and tune the rest of the guitar from the open E.

---

*I should also have mentioned that the average guitar amp still puts out a bit of 60Hz hum, so this could be the go-to solution today. (On the other hand with a computer or smart phone you can ask any Internet browser for any pitch … so that could be the gazillion transistors solution to this problem …)

Is computer science distinct from the liberal arts?

Let’s first collect a few opinions on what constitutes “liberal arts”.

Wikipedia says:

Liberal arts education (from Latin liberalis "free" and ars "art or principled practice") can claim to be the oldest programme of higher education in Western history. It has its origin in the attempt to discover first principles – 'those universal principles which are the condition of the possibility of the existence of anything and everything'.

The liberal arts are those subjects or skills that in classical antiquity were considered essential for a free person (liberalis, "worthy of a free person") to know in order to take an active part in civic life, something that (for ancient Greece) included participating in public debate, defending oneself in court, serving on juries, and most importantly, military service. Grammar, logic, and rhetoric were the core liberal arts (the trivium), while arithmetic, geometry, the theory of music, and astronomy also played a part in education (as the quadrivium).

The Liberal arts today can refer to academic subjects such as literature, philosophy, mathematics, and social and physical sciences

Many universities follow this general scheme, e.g. “the engineering disciplines” are usually placed in a different division of the university.

And if we take the early notions of “computer science” as put forth by some of the founders of our field — Herb Simon, Alan Perlis, etc. — which take the word “science” literally as “understanding by making maps/models drawn from phenomena” — and I agree with them — then we would place “Computer Science” in the Liberal Arts division along with mathematics and the sciences.

So, my answer to the question is “No”.

The main good thing about this is that it encourages universities — and profs and students — to not just use “CS” as a term for many things that are more properly engineering pursuits than science pursuits, and perhaps to try to do a little more proper “computer science”.

However, I’ve always felt that there is much too much stovepiping going on everywhere, and that “liberal arts education” — broad and useful as it is — should be even broader. This is not just because important subjects like “real science” have important engineering parts to them, or that it’s not possible to do engineering in most areas without getting fluent in science.

For one thing, it is possible to get a liberal arts degree today without getting even to the first stages of fluency in science or math. This seems quite wrong, if only from the standpoint of the emphasis on important and central ideas for humanity.

As a much larger example, today I don’t think that 'those universal principles which are the condition of the possibility of the existence of anything and everything’ can exclude modern engineering and allied fields, because the latter have brought a lot of important new things into existence, and therefore have shown that they exist, and therefore the processes and principles, etc. should be added as a proper focus of philosophy, etc.

I believe I partly have this opinion because — via getting an “A in unpopularity” with my Junior High School teachers — who told me that I had better pass the test for Brooklyn Technical High School and “get out of here, or else!”. So I did, and did.

This led to a regimen that was about 6 years of work stuffed into 4 years, in which the regular HS academics, more science and math, and a four year engineering curriculum were jammed into long days with 6000 other boys in an enormous block long and block wide 8 story building. “Electives” were so elective that you did music, sports, etc, after school hours: this actually worked quite well albeit requiring longer days.

This was one of those experiences which was quite illuminating and helpful after it was over with. I think most of us would agree after, that the extra work elevated every part of the work.

One of the surprising things I found when I got to college is that they didn’t make you work very hard to get a degree. I had gotten used to Tech so I wound up with 2 major concentrations and 2 minor concentrations (and I was very happy that there was some time left for working on theatrical productions in the excellent theater department).

Still, even back then in the early 60s, I felt that the universities should be more heavy handed about getting students to get fluent in “humans in the large and civilization in the large” as a necessary part of a modern education.

I fear that this is even more complicated today, given that (a) high schools seem to be accomplishing less, (b) more and more time in early college years is more like remediation than new learning, and (c) even major universities today seem to be doing something more like trade school training for vocations than to carry out the proper processes for education.

I’m not advocating setting up college like Brooklyn Tech! For example, each semester, besides math, science, English, history, foreign language, there were also at least 3 engineering courses (of which one was “engineering”, another was “design and mechanical and industrial drawing”, and the third was some form of large scale hands on “shop” experience (like green sand casting, industrial machine shop, building a house or airplane).

However, to just pick one illuminating example, the “engineering course” for all of Freshman year was “Industrial Processes”, which was in depth learning about how pretty much every main thing in 20th century society was made, and in quantity: every erg of energy, metal, plastic, wood, paper, petrochemical, pharma, etc. I found this fascinating from so many angles — and facing up to the industrial revolution in depth by looking at the hows and whys also provides grist for modern philosophical mills.

Today I couldn’t imagine setting up a two year “humans and civilization” course in a liberal arts college without including something like this, and a number of other important things drawn from outside the traditional liberal arts.

Could you?

What would Alan Kay suggest is the ideal way for somebody to start learning about computing at a corporate level?

After procrastinating for years I finally seriously took up classical pipe organ at age 40 while Chief Scientist at Atari and then a Fellow at Apple Computer. As with most people of that age, I was already quite busy.

Learning the pipe organ — is 2+ hours a day for quite a few years — with a ratio of 2/3rds classical piano to 1/3 organ. Sometimes this would be just classical piano for many days and even months.

I mostly did this by getting up really early in the morning to have the freshness, time, and solitude needed for the 2–3 hours a day I spent. (It was quite difficult.)

The basic idea of fluency is to put enough time and focus into the learning process to have the new things “become you” as “you become them”. Part of the result is a different and fleshed out “intuition” about what you are trying to do to go along with the many skills needed.

I was helped by having learned other instruments (jazz guitar) fluently when I was a teenager. This did not help with the mechanics at all, but did help with having a feeling for music, how it is written, what it means to play it, etc. There was no question that many parts of this learning were much more difficult to internalize than when I was a kid.

Another perspective is that I’ve worked with many intelligent busy adults over the years, both in corporate and educational type settings, and have found that most of them, even with considerable desire, have not been able to put enough sustained will together to fit a really new big thing into their adult learning curve (I in fact was quite desperate about this for many years before I just did it).

However, in an earlier period of my life I also taught guitar to a wide range of students, including adults. The adults had a real advantage over the kids by having much more perspective about themselves and their processes — they could decide to practice, and would practice. They were generally much more limited than the kids via how long it took to learn physical things, and their biggest barrier was often having a much stronger idea of a goal (which was almost always to be able to play a favorite song or two like one of their musical heroes). So it was hard for them to be happy with stuff that wasn’t their target. I spent quite a bit of time making simpler versions of their target so they could feel they “were on their goal”. The kids on the other hand learned more easily but usually had real problems with practicing. The ones that didn’t have these problems just soared.

A lot of the consulting and advisory boards etc. I’ve done and been on over the years have been for corporations, and two of the main problems are that (a) the higher-level executives don’t understand computing, and (b) the employees who “do computing” don’t understand computing.

Both are really difficult to teach for the reasons above, and the additional difficulties of “(b)” of having gotten fluent in a weak version of the pursuit. Since humans are terrible at forgetting, new learning has to go over what is already there, and there can be deep conflicts and frustrations.

And given this, it should be obvious that the “(a)” people have also learned things that will make learning some of the new things more difficult than they would be for a child.

Still, a big plus — and the military has this also — is that if the corporation is interested in having its employees — especially the higher-level ones — learn about computing, then by fiat it can make the time available during the day as part of the jobs (and salary) of the learners. Quite a bit of time will still be required, but under these conditions quite a bit can be done, and perhaps rather analogous to what it would mean to have the goal to get fluent in a foreign language.

And, in fact, there are examples of this in corporate history. My favorite, and I got to watch the process firsthand while being on the corporate advisory board, was the computer/systems-design training regime of the 32,000 employees of Andersen Consulting in the mid-80s. They were extremely serious about how they went about things, so they had bought a 4000 bed university (formerly a Catholic institution) in St. Charles, Illinois and used it to train their “Arthurs” in many things, and most especially in “Method/1” which was the strategy and tactics of design and implementation for their engagements as consultants with their clients. Pretty much every “Arthur” spent at least two weeks each year at St Charles, and in their earlier years, spent much longer. All of this plus exploratory research was done by the Technical Services Organization (run while I was there by the amazing Mel Bergstein).

This was essentially “plug compatible commando training” and was tremendously successful. It missed much of what was really powerful about computing, but it was very compatible with their clients, and thus made a large and successful business. If it were music, it was about marching, playing marches, and composing marches.

The TSO people were much more adventurous about the larger scopes and powers of computing, and part of the process I was involved with was to see what could be done about the larger issues of computing, especially give the resource of St Charles. (This also eventually involved getting the inimitable Roger Schank to bring his whole sub-department at Yale to Northwestern to help shape the new experiences (a different story for a different time)).

One of the most interesting groups within TSO was the “object-oriented design” team that was based in New York. I think there were 30 or perhaps 40 of them, and this group had gotten very good at understanding real object-oriented design and implementation. This took them about 3 years, and they were supported and sheltered by Bergstein. (Another great story for another time was how all of this paid off with the still amazing Brooklyn Union Gas design and implementation during the late 80s.)

I think it is much easier for a group of people to learn good versions of “real computing” together, and they can if they are given the time and the support by their company. The most critical parts of the process are how to introduce the really important things that need to be learned that will seem to be “off task” by both the company and the learners (this is because most “normal” corporate computing could hardly have a worse process or ideas about making systems).

So, as usual, learning something new and important is almost always going against “normal” (which if you think about it in regular society, means “doing something crazy”). This is the biggest hurdle for improving computing in corporate organizations.

What classes did Alan Kay teach at New York University’s Interactive Telecommunications Program, UCLA, Kyoto University, and MIT?

An elliptical answer in two stories …

In the Fall of 1966, via quite accidentally winding up as “grad student #7” at the U of Utah computing grad school — after undergrad in math and biology and working my way through college as a programmer at the National Center for Atmospheric Research in Boulder, Colorado — I discovered that it was one of the “ARPA Project” universities (which meant absolutely nothing to me as I walked into Dave Evans’ office).

Many things hit me over the first few days which I’ve recounted elsewhere, but I’ve given less mention to the thrill of discovering that the main designer of the Burroughs B5000 computer — Bob Barton — had been convinced by Dave to move to Salt Lake and be a professor while he was consulting for Burroughs.

I had learned enough about this amazing machine to want to understand the parts I couldn’t make sense of. So I signed up for Barton’s “Advanced Systems Design” course with great excitement.

Bob Barton was about 6′ 3″ or 4″ with the face of an imp, a mind that ranged from impish to extremely impatient and depressed, a command of language and vocabulary rarely encountered, and one of the true geniuses of our field.

image

He did not like students (nor most computer people, nor most people).

He stomped into the first class of this course and handed out a list, saying:

“There are a few things known about advanced systems design, and most of them are in these papers. I expect you to read them and understand them thoroughly.

“But my job here is to firmly disabuse you of any fondly held notions you might have brought into this classroom.”

The “course” was him finding out the many things that each of us believed and demolishing them. (Later I found out that quite a few of the things he demolished were also things he believed in!)

This was one of the two or three greatest gifts any teacher has given me.

The second story was told to me by my friend Neil Postman (the marvelous author of “Amusing Ourselves To Death”, “Teaching As A Subversive Activity”, and many other wonderful thought provoking books).

When Neil was a grad student (in English), he and another grad student decided to take off from school for a while and meet Marshall McLuhan — this led to them following him around from talk to talk. McLuhan didn’t really sleep, so the two kids would sit up with him in his hotel room while he smoked cigars in bed in his dressing gown and just talked and talked and talked.

image

Neil said they noticed when Marshall gave a public talk and was asked a question or was challenged on a point, he would just ignore these, and would say to the person in the audience: “Well, how about this one?” and proceed to utter more of his “zen-koan-like” slogans.

After a few of these they realized that McLuhan didn’t care whether the audience agreed with him or not. McLuhan cared about whether the audience was thinking at all. And his job — and his life — was to wake them up.

I hope the point of these two stories is clear. One way to think about it is that anything that is good that happens in “schooling” is when a human being decides to put in the work needed to learn deeply enough so they wind up a different and better thinker as a result.

(What then should be the job of the “teacher”?)

Bob Barton and Marshall McLuhan both understood this.

What are the Seven Wonders of computer science?

I love this question — in no small part because I don’t think a really correct/exclusive answer is possible.

I have a particular set of meanings for the term “computer science” — the questioner might have meant to say “computing”. A few of my items might be taken from “computing” rather than from “computer science” per se.

The things I have found to be astonishing and amazing (and shocking) are:

Turing’s notion of machines that can simulate machines completely by interpreting their descriptions (exhibiting the programmable computer as “a language machine” and a “meta-language machine” — along with this is the simplicity of what is required to do so (a great book is Marvin Minsky’s “Computation: Finite and Infinite Machines”). Turing’s approach is much more of a “real CS” approach compared to Goedel’s earlier methods, and soon led to a large number of important next steps.

<this one is tentative at this point> How simple (a) it is to design a whole computer from just one kind of logical element (e.g. “NOT-BOTH”), especially when compared (b) to how Russell and Whitehead struggled to “bootstrap mathematics, etc., from logic at the turn of the last century. (This is one of those “Point of View is Worth 80 IQ Points” …)

Lisp, and McCarthy’s general approach to “mathematical theories of computation” and having languages that can act as their own metalanguage. One of the great cornucopias of our field.

Sketchpad by Ivan Sutherland for so many reasons, including: the approach to interactive computer graphics and the simulations of the graphic relationships, the “object-oriented” approach to definition and deriving new kinds of things (including “masters” and making instances from masters), enormous virtual worlds that are windowed on the display, the use of goal-directed programming with the system solving the simultaneous goals in real-time, etc. And more, including the demonstration that a simulated computer on a computer need look nothing like the underlying hardware or any “normal” idea of “computer”.

The big Shannon et al. ideas about how to have imperfect things be organized in systems that are much more perfectly behaved even if the organizational mechanisms are themselves noisy. Includes all forms of “noise”, “representations”, “communications”, “machines”, etc. and poking deeply into Biology and how living things work. Nice implications for “stochastic computing” of many kinds which are needed more and more as things scale.

The deep implications of “symbolic computation” (now a very un-funded area) for being able to move from the trivialities of “data” (no matter how voluminous”) to the profundities and powers of “Meaning”. This used to be called “AI” and now has to be called “real AI” or “strong AI” (it would be much better under a less loaded term: how about “Flexible Competence”?)

The Internet. Certainly the best thing done by my research community, and the first real essay into the kinds of scaling and stabilities that all computer science should be trying to understand and improve. This was a great invention and development process in all ways, and — by looking at Biology, which inspired but we really couldn’t use — it had a reasonable chance to work. That it was able to scale stably over more than 10 (maybe 11) orders of magnitude, as indeed planned, is still kind of amazing to me (even though it should have). Judging from most software systems today not being organized like the Internet, one is forced into the opinion that most computerists don’t understand it, why it is great (and maybe don’t even think of it as the fruits of “real computer science” because it just works so much better and more reliably than most other attempted artifacts in the field).

When is incremental progress good?

“Better and Perfect are the Enemies of ‘What Is Actually Needed’ ”.

If WIAN can be accomplished incrementally without changes in Type, then there are many advantages to this, including low risk and high integrity of results. (Basically you want engineers to design the bridges and airplanes that will be used by the public, not scientists — who might be more interested in collapses and crashes!)

There are two lurking bugs here, both important:

If only “better” is accomplished without WIAN, then a pernicious process has been started, where “better” will eventually mean “worse”. The bug is confusing “better” with WIAN.

If WIAN means “something qualitatively different” — a change in Type — then “incremental” will almost never get there — it is much more likely to be asymptotic to WIAN. The bug is confusing WIAN with the Type of what we have.

Bottom Line Heuristic: Always look for WIAN as carefully as possible — and take as long as you possibly can to identify it. (The champ in my experience was our hero, friend, and colleague Paul MacCready - Wikipedia).

I hope readers of this will supply examples from their own experience (I have many) but for this answer it will be more fun and instructive to collect experiences.

Does Y Combinator's "100 happy users" advice originate from Butler Lampson?

Eddie’s answer is “plausible”.

Butler did get us to agree to the “100 users” idea.

Eric Schmidt actually worked with Butler when he was at Parc, and certainly picked up the idea while there.

Eddie’s reference (4) seems diffuse on this point, especially regarding when one should think about scaling …

If Alan Kay made an introductory technical course syllabus to teach real computer science to undergraduates, what would be on it?

I will add to and change this from time to time because there is not enough room in a Quora-sized answer to provide a good enough answer, nor enough time to really think things through as needed, but …

What if the intent here is “as a required course for all undergrads” and the question said “real music” instead “of real computer science”?

In both cases there would be a wide spread of knowledge and skills in the student body, from “no previous contact” to “been involved professionally”. I think one difference in the music case is that some of the very involved students would be “classically trained”, and that would almost never be the situation for incoming computerists.

And in any case, most of the students will represent more of a pop culture spread of knowledge. Because there is a lot of playing and other kinds of doing — composing — in music to go along with the theory, I think that it would be a must to have a number of introductory courses — some several years long — to give the less prepared students a chance to build up basic knowledge and skills as “practical doing” within a framework of “real music”.

There’s also a happy medium to content with prior experience: it is very often the case that “pop culture” skills that are deeply embedded but not comprehensive can be a big barrier to really learning a “real X”, especially one that involves various kinds of “literacy”, “reading”, and “writing”, and years of work. So starting pretty much from the word Go can be a real boon if there is enough time to develop the deep ideas in the best ways.

Just to pick one more of the many issues that need to be considered is that an undergraduate education is supposed to be much much more than training for a job; the main purpose of college is to take students who think they are trying to get from A to B and to help them acquire enough perspectives and knowledge to see the previously invisible Cs and Ds as possible alternate and often better goals for them.

In other words, the purpose is to aid qualitative changes, not just improvements. So important parts of this curriculum should include “ideas from civilization in general” (I will put this important area to the side for now).

This brings up the woeful fact that so many students enter college without much of what high school was supposed to help them learn — and this forces much too much of the time spent in the first years of college as “remediation” rather than “doing college”.

Finally, the actual process in most colleges is also anti-learning in many ways, especially the lecture-centered, subject/department-centered, grade-oriented processes. (This is one of the reasons I bring up “music” to help think about this question — at least in a good music college, it is not lost that a lot of the time has to be spent actually doing music — and getting better at doing music — not just hearing opinions about it.)

As for content in computing — which is very wide ranging these days — I would concentrate early on two main things — systems and meta and how they intertwine— and set up what is needed to get the students to a real first stage of fluency in thinking and doing with these ideas.

Both of these have easy starts to them, and both go really deeply into many of the centrally important topics, issues, and difficulties in the current states of the art. For example: simple-meta can be approached by designing and adding new language features in the spirit of the language; deeper-meta can be about changing some of the underlying primitives in important ways. Simple-systems can be about design and making computations, especially where scaling is involved; deeper-systems can start to look at how communications can be enhanced with “alien systems” via negotiation and other techniques that require various kinds of learning.

I would most certainly — with colleagues — invent and make a special programming language and environment to provide a smooth and simple enough vehicle for getting fluent in systems and meta: “Simple things should be simple, complex things should be possible”.

I’d take a different path from SICP, and use a different vehicle, but retain the idea of “safe and expressive meta” as a good route to represent meanings in ways that allow more profound meanings and processes to be created.

The important idea here is parallel to: “Physics is not Math, but you really need Math to really understand Physics”.

Similarly: “Computer Science is not Programming, but you need make systems via Programming in order to really understand Computer Science”.

The wide ranges of: learners, kinds of help needed, and motivations (including negative) that students bring to a required course have to be carefully heeded.

For example, it is very worth perusing the considerable lore that has now been gathered about teaching computing to wider ranges of students (for example from Mark Guzdial at Georgia Tech, now at U Michigan).

One indication is that entering undergrad students at an engineering school like GaTech are often very ill prepared along many dimensions, and many will have considerable difficulty grasping accurate enough understandings of mechanisms in programming languages in the entry-level courses.

In the old days, one might ask “how did they get into an engineering college without decent prep?” But today, colleges are much more like businesses, and for a variety of essentially monetary reasons will generally try to (a) admit to a level of bursting capacity, and (b) will try to “retain” the headcount rather than try to sort.

Because of the generally poor high school educations available, this is not necessarily a bad thing, but it has greatly changed the relationship of students to the thresholds of quality needed in many subjects.

In any case, this means that what I’d like to do in an introductory course for the students would really have to have a wonderful development environment to help them build good working models in their minds.

I think of help using a rule of thumb: some students don’t need much help, some will need help of “Type A”, some “Type B”, some “Type C”, and some you won’t be able to figure out how to help. The environment needs to do really well with A, B, and C.

This has many useful parallels with music learning. Some students will have very good pitch senses, others great rhythm senses, others great muscle and other memories, others a “feel” for harmonic theories, etc. It is generally the case in music that the budding musician will not have great skills in every part of music, and will wind up doing a lot of practicing developing what’s needed. So: you really can get much much better at hearing what is going on via “ear training”, at “tracking others to be ‘in a groove’ with them rhythmically”, to “know where you are and where you are going” while improvising, certain kinds of practicing will help the various musical memories to get deeper and faster, etc.

In Alan Kay's answer to How can one become a better thinker? I mention a little about learning and thinking better being partly the creation of “brainlets” to offload as much of the work as possible. The students need to understand that this is part of the process for really getting fluent in anything, and to understand what is needed to help the “brainlets” form and grow.

Dealing with the centrally important “one size does not fit all” problem is one of the most important parts of any course design. For one thing it argues against lectures-as-content — even if they were otherwise a good idea (they really aren’t!).

I think an “inspirational talk” every so often can be a great thing, but we shouldn’t confuse what is good about oral cultures — the emotions, romance, theater, and sense of belonging — with literate cultures — the much-larger better-organized multi-perspective self-paced, etc., Better Organization Of Knowledge.

Today we have the possibilities of: a DYNAmic Better Organization Of Knowledge, and we need to make this, and use it to the fullest.

Here’s another analogy to music learning. Generally, one has a private lesson with a teacher once a week (maybe twice, though that is pushing it). Most of one’s time in a week is spent practicing and learning. Almost without exception there are ongoing group experiences: orchestra, chamber music (including “piano band” if one is a keyboard player), chorus, etc. And there will be some courses e.g. in theory, composition, repertoire, etc.

We can see that the balance between a bit of private tutoring and considerable work on one’s own is very different from standard US practice, but is like the “tutor” system that has been used at a number of UK universities such as Oxford and Cambridge. The tutor acts as a guide and quality control, whereas the student has to do most of the digging and practicing and other forms of doing and learning (this is a good thing!).

As for the group experiences, this used to be easier and more common in computing’s “deep past”. Major universities would get — or even better make a computer — and the students would form part of the larger design and building experiences for both HW and SW.

Even if a computer was purchased, the operating system and languages on it were usually pretty bad, and there was a lot of motivation to do a better, more useful set of tools.

(Today it is still true that the OSs and languages are not very good on purchased computers, but the perspective to see this has been pretty much lost, and the will — and even expertise — to design and make a better OS or language (or better subsystem) in university has been lost along with it). This has removed from students of today much of the “real” from “real computing”, and quite a bit of the fun.

I think for a course like this, as a stand-in for the “new computer” in the old days, I’d pick something (a) that is like a whole computer, that is (b) going to be used every day, (c) is badly conceived, (d) I’d help them see that “normal” does not necessarily mean “good” or “acceptable”, and (e) have them learn by doing the designs and building something much better.

How about the web browser as a target? (Yikes!). This could be doubly interesting because it runs on top of the Internet — perhaps the best large system ever done in computing.

This would be a bit tricky to set up, but quite interesting to see how this could be approached as a large group target. One would start by looking at what TCP does, and what it could do, and what the scaling of users — especially authors — implies about what is needed to deal with the major issues for both “reading” and “writing” and “extending” the new medium across billions of nodes and trillions of objects.

(It’s worth pointing out that quite enough was known about systems design in the 70s to make very a few good operating systems, programming languages, and the Internet itself. Unfortunately, essentially none of this knowledge is apparent in the designs of the Web Browser, and precious little for the Web itself.) The good news is that an amazing amount of this can be cleaned up and simplified by much better designs that run directly on IP (or on an extension of TCP/IP).

This makes a lot of sense to me because today “the computer” is “the Internet of computers” both connecting physical hardware together outside of the machines, and also connecting the virtual software together inside the machines, and all as one network idea.

<Hopefully not much more to come here>

---

I’m guessing that it would take a minimum of a year — more like two — to do the design and prep for this course. For one thing, it would take a while to design and make the programming language and environment that would be used, and it will take more time (and real effort) to have the environment also be able to supply the useful “helps” that many of the students will need.

Many parts of the curriculum will need to be tested, so some of the design and testing can be done as projects in other courses, and with the equivalent of the paid students who are used as test subjects in psychology experiments.

Our experience in doing past curricula in schools is that it takes about 3 years to get even one that is adapted from already tested materials to jell and work. Part of this is to give the teachers time and experience to get comfortable with both the subject matter and the rhythms induced between this and the spread of students who will need to helped in various ways. Since — here — we are planning to do the teaching ourselves, we can probably cut off a year from the prep time. So let’s say “really two years”, and “maybe we could start with examples in one year of prep”.

So, here, I can only jot down some notes — even just “impulses” and “instincts” — that are occurring to me as I work through the ramifications of the question (these often don’t pan out, but I like to write them down as they occur).

The first big impulse is “start with something massively parallel in the spirit of StarLogo, NetLogo”, and “Kedama” (in Etoys), maybe with tinges of Elixer/Erlang thrown in.

Because “cognitive load” is a main factor with introductions to new things, we want to somehow both (a) have the manifestation of the design be fun and really interesting — so it will need to involve interactions between the elements — and (b) we would like to have this experience use few enough elements to avoid as much as possible “induced blindness from complexities”.

Examples like these will drive both the curriculum design and the design of the language and environment to try out ideas by using them, making them, etc.

The second impulse was “vacuum cleaner!” (I have a feeling that this will wind up further down the list, and that something more like “epidemic!” might be first. Still, what is actually going on with a vacuum cleaner is wonderfully opposite to what most people suppose, and the modeling of it is simple and easy to understand.)

<….. more ahead …..>

---

A mapping to the “outside world” is eventually important, but a main aim would be to have the pedagogical system not exhibit properties that make it difficult to think about and use certain important good ideas (this is a real problem in most language systems in use today, whether academic or commercial — i.e. they have given up too much meta for the sake of other considerations, some merely historical and/or “religious” baggage).

So: one of the main purposes of any early course in anything is to help the students gain enough knowledge and perspective to start to be able to criticize the current state of knowledge in the field that they are being required to learn.

Was the Xerox Alto a prototype or a finished product?

It was neither. It was a “time machine” to allow individual researchers starting in 1973 to work about 12–15 years in the future they were trying to invent rather than being forced to use and work around the hardware and software of the early 70s.

Another way to look at this is: if you are trying to invent something new and different that will take a while to do — and in our case it was all the new software and user interface design that needed to be invented and built — and there is an exponential that looks “reliable” (as Moore’s Law did back then) — then you will be hopelessly behind if you don’t anticipate the changes that will be rendered by the exponential during the years of development.

A further perspective on this is that since “a computer is a computer is a computer”, you can reliably buy your way into the future by making a supercomputer in the present, knowing that it will be at commodity costs in 10–15 years.

For more about the structure of the Alto, see To what extent did the Xerox PARC Alto inspire the Apple Macintosh?

The two big benefits of having a “supercomputer per person” were (a) that many experiments could be easily done without optimizations because of the speed available, and (b) if you did optimize then you could make the kind of software that would run on commodity machines (also optimized) 10 years in the future. “(a)” produced screen painting primitives, the GUI, and Smalltalk among many other new things, “(b)” produced what would be Microsoft Word, and other PC apps.

This way of looking at things was quite standard in the ARPA research community, and those of us who went to Xerox Parc took this spirit with us.

Because the future — as we saw it — would have one computer per person all networked together with resources, the “supercomputer per person” had to be engineered far beyond a demo, and had to be easily replicatable. (At Parc we always tried to engineer for ~ 100 users, so we had to be able to make 100 Altos, the Ethernet had to be able to handle 100 Altos, the laser printer had to be able to handle 100 users, the software had to be reliable enough and documented well enough to support 100 users, etc.)

Most of the design and engineering magic on the Alto and subsequent HW was primarily the work of Chuck Thacker.

The Alto was quite successful in meeting its design goals, so eventually almost 2000 were made: enough to spread around Xerox, and then to many of the ARPA universities.

Meanwhile, Parc kept on inventing many more newer machines to stay in the future.

The viability of an Alto type machine as a product really depended on the VLSI technology for main memory. The first year that an Alto-type machine could conceivably have been made a product was ca 1976–7. It would have then been salable for about what a stand-alone “word-processing machine” would be — so not cheap, but with lots more functionality and flexibility.

How do you understand that you don't understand the understanding demanded by a problem?

I always assume I don’t have the necessary understanding to start on anything (this is a good heuristic for we humans to insert before thought and actions of any kind).

One of the consequences of this is that because many “problems” are quite often not actually “the problems”, this provides better starts to “finding more appropriate problems”, and to finding better ways to deal with them.

What was computing like in the 1960’s?

I started computing ca 1961, and my job in the Air Force starting in 1962 was as a programmer for Air Training Command at Randolph AFB in San Antonio. I entered grad school at the University of Utah ARPA project in 1966.

With all due respect to Andrew Silverman’s answer: although there weren’t millions of computers back then, there was more demand than supply for programmers — so anyone with interest and aptitude could definitely get access (you had to go to the computers).

Because of the Cold War, a lot of DoD money was being spent on the development of computing, pretty much right after WWII, and especially in the 1950s with the development of the SAGE early warning airplane tracking system. This had started at MIT with Whirlwind ca 1950, and many important early technologies happened on this machine, including: displays, pointing devices, interactive dynamic language, core memory, etc. A lot of engineering was done on these in the SAGE development, and this helped raise all boats.

The early 60s also saw a deeper effort to replace punched card accounting machines (PCAM) with computers (this led to more than 14,000 1401 computers being built — and to the way IBM thought about the eventual System 360 in 1964).

JCR Licklider started the ARPA Information Processing Techniques Office (IPTO) in 1962, and this agency plus ONR and the Air Force did the bulk of funding for visionary long range projects in computing. IPTO alone funded about 16 large research community efforts spread around the country.

The first fruits of ARPA funding can be seen by getting — it’s online — the Sept 1966 special issue on “Information” by Scientific American. This special issue was completely written by the ARPA community, including John McCarthy, Marvin Minsky, Ivan Sutherland, Dave Evans, etc. I cherish this because I first met this community in Nov 1966 as a grad student, and this was a great summary of what they had accomplished in the first 4 years.

Today we have a paradox. We have enormous amounts of inexpensive computing power but very few ideas out and about: commercialism has produced a boring low-level orthodoxy and created such a monumental stultifying “normal” that I think it is much harder for youngsters to think about interesting things while being suffocated in “really nothing much at all (but no oxygen either!)”.†

In the 60s, there was also a boring low-level orthodoxy — especially IBM — but because of ARPA et al. great ideas not only abounded and were being worked on, but many of the ideas were superior — in absolute, not relative, terms — to most ideas in view today. ††

When Congress in 1970 made a huge error and used a too broad brush to recast ARPA, ONR, etc., they threw the baby out with the bathwater and left most of the important computing research of the country to the whims of businesses — most with IBM’s fatal disease — and government agencies — like NSF and the new DARPA — that were much too much under the thumb of Congress.

In sum, computing in the 60s had a large super exciting part to it — it was a Grand Romance — that had a big payoff in the 1970s. I had fun as a programmer in the early 60s, and the best time in my life from the middle 60s to the end of the 70s.

---

† There are a very few exceptions today to keep us from being totally gloomy, for example: Bret Victor’s “DynamicLand” is a breath of fresh air full of oxygen and fruitful ideas for a much more exciting and humane future.

†† Here are a few personal favorite highlights of advances in computing in the 1960s (many more will have to go unmentioned). Many, but not all, funded by ARPA-IPTO.

1960

• JCR Licklider “Man-Computer Symbiosis” which started the vision that led to ARPA-IPTO funding and much of the personal computing and networking technology of today.

• Algol 60, BNF, and Ned Irons’ “Syntax directed compiling”.

1961

• The Burroughs B5000 (main designer Bob Barton): a two CPU “higher-level” computer with tag bits on every word, unforgeable references, protective memory management and swapping, parallel processing, and a hardware implementation of a “byte-coded” execution engine tailored for Algol-like languages. A “Simula-machine” several years before Simula, and the first system to implement an operating system completely written in a higher-level language. Can’t be too highly praised.

• CTSS/MIT one of the first real interactive time-sharing systems as a “computing utility” for a whole community.

• Paul Baran at RAND presents first briefing on the possibilities of packet-switching networks

1962

• Lisp 1.5, John McCarthy, et al. The very important follow-on from the invention of Lisp (can’t be too highly praised) in the late 50s.

• JCR Licklider starts ARPA-IPTO with initial funding to MIT for “Project MAC”. Can’t be too highly praised.

• “Sketchpad” by Ivan Sutherland at Lincoln Labs. Not just the invention of interactive computer graphics as we know it today, but with “masters and instances” initiated “object-oriented” representations, and used “constraint solvers” as a very high level programming language. Can’t be too highly praised.

• Wes Clark’s LINC at Lincoln Labs — arguably the first personal computer that included all the standard features

1963

• John McCarthy — Situations, Actions, and Causal Laws — a landmark follow up to the “Advice Taker” paper “Programs With Common Sense”. I believe this is the first introduction of the idea of “labeled states” and “situations” as gatherings of “labeled states”. Can’t be too highly praised.

• JOSS at RAND, Cliff Shaw. The first truly well designed interactive programming system for end-users

• Thomas Evans, Geometric Analogy Solver — my vote for the top thesis from the incredible first round of “real AI” theses from Marvin Minsky’s students.

1964

• SRI: Engelbart and English invent “mouse” pointing device

• RAND: Tom Ellis et al, invent the RAND Tablet, a truly great pointing device

• Simula I converted Algol to the language that served as the model for C++ in the early 80s. A great combination of insights (implemented in 1965, and published in the US in 1966).

• Meta II by Val Shorre was a wonderful tiny and powerful self-bootstrapping way to create programming languages, including itself.

1965

• Ivan Sutherland’s IFIPS paper inventing virtual reality and augmented reality.

• Butler Lampson’s first designs and writings about systems design, protection, and inter-process communication fomented by the “Project Genie” time-sharing system (which became the SDS-940, and the computer used by Engelbart for “the mother of all demos” in 1968). Butler can’t be too highly praised.

Niklaus Wirth — Euler — was a compact detailed runnable example and model of making a generalization of Algol (kind of Algol meets Lisp as motivated by van Wijngaarten). A great set of ideas to learn from, especially if his syntax apparatus was replaced by e.g. Meta-II-in-Euler. Used a small interpretive virtual byte-coded machine modeled after the earlier B5000 hardware.

1966

• Gab Groner’s great gesture and hand drawn character recognition system at RAND, using the RAND tablet.

1967

• Dave Evans, Gordon Romney, et al, with breakthrough by John Warnock, University of Utah ARPA Project — Invention of modern continuous tone 3D graphics

1968

• Doug Engelbart, Bill English, Jeff Rulifson et al — “the mother of all demos” showing the public an elevated rich example of many facets of “personal computing”, idea linking, collaborative simultaneous creating, etc., from a cosmic vision that has largely been lost today. The overall vision and effort can’t be too highly praised.

• Ivan Sutherland, Bob Sproull, Chuck Seitz et al. — VR and AV for the first time

• Tom Ellis, et al. — The GRAIL System — the first completely pen-based system, recognizing drawings and text to make hyperlinked simulations, etc. A landmark in “intimate computing” and user interface design.

• Ned Irons — IMP — the first really workable and convincing most easily fully-extensible language

1969

• Pat Winston — Learning Structural Descriptions From Examples. This thesis was published in 1970, but the work was done in the late 60s. A truly wonderful piece of work that creates a generalized theory of analogy and representation that was motivated by the previous work of Tom Evans, but breaks new ground and extends the range of ideas into programming languages.

• Carl Hewitt — PLANNER — both the spiritual and an actual parent of Prolog and many other inference systems to come that were also programming languages.

• Dave Fisher — A Control Definition Language — also published in 1970 but done in the late 60s. This was a powerful comprehensive look at extensible languages — especially with regard to making mechanisms to “manipulate time” using Lisp-like kernel metaprogramming.

• Nicholas Negroponte — The Architecture Machine (MIT Press) — published as a book in 1970, but work done in the 60s. By far the best account of the longer future that involved computers. Partly because Nicholas trained as an architect, and partly from his larger take on what was around him, this was the deepest synthesis and look-ahead at profound future possibilities. Can’t be too highly praised: just multiple meanings of the dedication alone can be pondered fruitfully for a long time “To the first machine that can appreciate the gesture”.

• ARPA Community — design and building of ARPAnet and first transmissions. Prime movers were Bob Taylor, Larry Roberts, Wes Clark, Frank Hart, et al.

<more to come>

How can one become a better thinker?

To me, the first thing to notice about human thinking — via scrutinizing the rare examples of “first class thinking” we can identify — is that we humans are pretty much not genetically equipped for thinking at all!

We are most well set up for learning how to fit into our cultures, and make our way in them, mostly socially. We are a little bit clever, and like other primates are able to cheat in numerous ways, but we very often aren’t smart enough to also take into our minds the consequences of cheating.

We have “lots of ‘coping genes’ but essentially no ‘progress genes’ ” — in fact, it appears that even the idea of progress had to be invented (mostly in the 18th century). The philosopher and mathematician Alfred North Whitehead remarked that “The great invention of the 19th century was Invention itself” (suddenly everyone was inventing!)

Most of the processes we associate with “good thinking” seem to be inventions, the good ones very rare and initially far apart. Despite the power of these, our thinking was so dim that it took quite a while to realize that one of the things we should be doing is to purposely invent better ways to think, and then figure out how to teach them to children to create much more able adults than we are. (We still do not see much call for this in most public media.)

Einstein came up with a great comment: “We cannot solve our problems with the same levels of thinking that we used to create them”.

This links up starkly with the Dunning-Kruger Effect: that many people are not able to think well enough to see that they are not thinking well enough. There is a very real sense in which all of us have this problem.

A parallel allied concept is sanity, which is usually assessed in a relative normalized manner i.e. “what the majority of people in a culture do, and especially if in accord with the culture’s norms, is considered sane, and that outside these norms is considered not-sane.” But if we look at sanity as “the goodness of the mapping between what’s in a mind and what is actually in the environment in which the mind exists”, then we can see that all human beings have “delusional disorders”, and many of the most dangerous ones are held by much more than a majority of humans! This includes the Dunning-Kruger delusion that they are “generally thinking well”.

Another part of considering what good thinking might be about is to notice that ignorance very often resembles stupidity. Imagine being born with twice Leonardo’s IQ but in 10,000 BCE!

And Leonardo, supersmart as he was, was not smart enough to invent any useful engines for any of his fantasy vehicles. He was in the wrong century — there was not enough knowledge for him to use and reshape with his intellect.

Similarly, it took geniuses to invent calculus but many much less smart people can learn it and become more powerful thinkers about many kinds of change than the geniuses of antiquity.

We can then reflect what someone nowhere near Leonardo or Newton — Henry Ford — was able to do. Why? Because of the vast change in context — how to look at and think about the world around us — for which Newton was the main catalyst and cause.

I think of this as “Knowledge is Silver, Context is Gold, IQ is often a Lead Weight!”

Or: “Context is worth 80 IQ Points!”

This is especially true if the Knowledge is (a) drawn from the strongest Contexts, (b) some of the Knowledge is the knowledge of Contexts (or Points of View, or Perspectives, etc.), and (c) some of the knowledge is what has been learned about how to think much better than our genetic minds can by themselves.

Almost 400 years ago Francis Bacon wrote about the “four Idols” that humans worship which confuse our attempts to think. In modern vernacular, we have “bad brains/minds” from our genetics, our cultures, our languages, and our academics. He called for a “new science” to be invented that would mitigate these as much as possible (much of what he called “new science” is what we call “Science”). One part of this is the idea that we can find and invent “methods and tools” which when carefully learned and used can help us think much better than our traditional processes did.

A classic study of human thinking problems — both in general and associated with language use — is “Science and Sanity” by Alfred Korzybski.

The field of Cognitive Psychology — of George Miller, Jerome Bruner, etc. — started to measure many limits to human thinking, for example that we can only deal with a very small number of things at once, and for something new we are almost blind, deaf, and dumb because we don’t yet have mental organizers for what we have to deal with. (This is called “Cognitive Load”.)

A more modern identification of human mental “noises, glitches, and barriers” is part of the work of Kahneman and Tversky — see Kahneman’s book “Thinking: Fast and Slow” — which includes how humans form valuations — the field of “Behavioral Economics” — and also unearthing the many “Cognitive Biases” we routinely exhibit. The Wikipedia article lists more than 100 that have been found so far — but readers will find it easy to identify and add more. For example, not included in the K&T list are (a) we confuse both our perceptions with “reality” and “normal” with “reality”, and (b) we often will generalize a good enough idea and then make dogma (and even religions) from it.

So we can certainly make a start on learning how to think better by (a) identifying existing barriers of all kinds, and then to find and invent heuristic workarounds that will help eliminate noise and increase clarity, and (b) from the other direction, to identify the strongest known ways to think clearly — for example, the methods and tools of science — and learn them so fluently that they will be at least as automatic as our less able genetic reactions.

Just a note here on this kind of learning. If you remember the stages you went through to learn to drive a car, the most striking were (a) the beginning stages that involved over-controlling, tunnel vision, not being able to hear the person trying to help, not being able to see stop signs and children, not knowing what gear one is in, etc. and (b) a few months later being able to steer the car, listen and talk to the other person, automatically be aware of stop signs and road conditions, etc. (This is a general learning progression for most things.)

What is happening during the learning is that a lot of the initial effort had to be done with your cognitively smart, but slow part of your brain, and this is easily overloaded. The learning starts to build little specialists — I call them “brainlets” — that offload much of the work to faster, less smart, but more routinized parts of the brain. This leads to both skills, and a certain resistance to learning different ways to do things.

Quite a bit of learning to think better is accomplished by doing the various processes that will build “brainlets” for helping to think. Some of them will damp down many of our genetic responses, and some of them will provide a variety of points of view, analogies, etc., and a whole host of heuristics to help.

Seymour Papert used to say “You can’t think about thinking without thinking about thinking about something”. In other words, it will really help to have important issues and ideas, and things to learn about that require much better thinking. Then, the many things that are known about thinking, how poorly we are generally at it, and what we’ve learned about doing it better, can be brought to bear.

What made APL programming so revolutionary?

APL stands for “A Programming Language”, the title of the book in 1962 written by Ken Iverson about what initially was called “Iverson Notation”. Part of the reason for the “notation” label was that it was used extensively a number of years as “a notation” before it was implemented as “APL/360” (on the IBM 360 series of mainframes).

Ken Iverson was essentially a mathematician, but who also had a physics background, and trained under Howard Aiken at Harvard in close proximity with the various computers designed and built there, receiving his PhD in Applied Math, with a thesis on how to deal with very large sparse matrices.

He started to use mathematical tools to describe computations and computers, and soon found these to be lacking. This led to a number of inventions very much in the spirit of mathematics that allowed many more structures and operations to be easily defined and “notated”, many by “functional projection”.

One of the most interesting things about “Iverson Notation” at this stage was that not having an implementation greatly helped — IMO — what he tried to do at the descriptive level: there were no worries about whether this or that could be implemented at the time, or whether there would be enough computing capacity for speed or space to eventually implement the notation.

It was in this form that I and many of the other grad students of the mid-60s learned “Iverson”. My first CS course was from the legendary and wonderful Nick Metropolis, the main architect and builder of the Los Alamos computers, especially the “Maniac” series. Nick liked “Iverson”, and used it extensively for both hardware and software descriptions. A year or so later, Bob Barton in his notorious first course in “Systems Design”, required us to “get and read and learn Iverson’s book”.

To motivate what Ken Iverson decided to do, it is worth looking at the history of Maxwell’s Equations — 4 ideas (can be just 2 or 1) that will fit on a T-shirt. However, one of Maxwell’s main renditions was not in the form we are familiar with, but was expressed as 20 partial differential equations in x, y, z coordinates.

image

This is not a great T-shirt!

Helmholtz and especially Oliver Heaviside did a fair amount of work to use the definitional possibilities of mathematics to hide coordinate systems with vectors and details of the PDEs, with “operators” (Div, Curl, Gradient … “and all that”).

image

A terrific T-shirt!

You can think of the operators “gradient”: ∇, “divergence”: ∇•, “curl”: ∇×, as “meta”, that act a bit like macros to rewrite functions in a more complex “decorated form”.

The basic idea here is to get “whole ideas into one eyeful” by inventing notations and processes that can do this, and consequently requiring readers to learn the new notations fluently enough so there is a net benefit rather than just more noise.

When this is done well, the new “meta-stuff” becomes generally useful (like the grad, div, curl “and all that” above). An example in APL is the operator “.” , which is generalized inner product that can take any APL functions as arguments. For example, what we think of as “matrix multiplication” is +.* in APL (see inner product in APL).

People who learn math are quite willing to do this learning and gain the necessary fluency — but there’s considerable evidence that most computer folks are not at all willing to do lots of training in special tools that would make a difference in “being professional”.†

This has led to the idea that APL is not readable. In fact, it is both very readable and very compact. This is not to say that a face lift wouldn’t help — the standard notation for APL was derived to fit on an IBM Selectric golf ball typewriter terminal, and could be greatly improved today.

The second interesting idea in APL is “projection”. This is much more relatable today in an era of “map/reduce” than it was in the 60s or 70s, even though one could also write a good “mapping” function in Lisp (and it was also an “operator” because it could take a function as one of its arguments). In the early 70s, Unix happened, and Doug McIlroy invented “pipes programming” to allow in this case “data” to be passed through “functions” to be reformulated,

However, the big uses and extreme ranges of this way to program was explored earliest and most extensively in “Iverson Notation” and to a slightly less extent in the actual language “APL”.

Attaining fluency in APL as one of three or so main ways to think about programming “is good for one’s mind”. As in the later map/reduce, one “sends” a structure in parallel through a cascade of shaping functions and then a cascade of trimming and extracting functions to finally get a result. (One must suppress one’s imagination of just how big some of the intermediate structures might be getting … this is also good for one’s mind!)

There is real clarity to be gained for both writers and readers of APL.

A number of us in our research group at Parc liked APL quite a bit, and it was clear that much more could be done using polymorphic operations and the extension features of Smalltalk (only a few of these experiments emerged publically in the 80s). But, imagine gazillions of objects provided with “events, suggestions and hints”, etc.

As always, time has moved on (and programming language ideas move much slower — and programmers move almost not at all).

There are several modern APL-like languages today — such as J and K — but I would criticize them as being too much like the classic APL. It is possible to extract what is really great from APL and use it in new language designs without being so tied to the past. This would be a great project for some grad students of today: what does the APL-perspective mean today, and what kind of great programming language could be inspired by it? ††

---

† This seems rather like the disinclination of so many pop culture musicians to learn to read and write music notation, despite the tremendous advantages for doing so in many areas — and in fact what seems to be a disinclination in much of our culture for learning to fluently read and write the written form of their own language. It’s not that you can’t do art in “oral modes”, but that the possibilities for art are so expanded when literacy is added.

†† As an example, a looser more versatile version of this kind of programming can be done using dataflow between processes that themselves are made from projective mappings, and this could yield a very useful and beautiful language. This is what Dan Amelang and some of his colleagues did to make the Nile Language, which was especially aimed at “graphical mathematics and rendering”. In the STEPS project of some years ago, this allowed virtually all of 2.5D “personal computer” graphics — including rendering, compositing, filtering, curves, fills, masks, etc., to be defined and run in real-time in under 500 lines of code. This replaced an estimated 50,000 to 100,000 lines of C++. Because of the dataflow and the independence of the mappings, this was able to be set up so it could use as many cores as available to run the code. (And so forth.)

500 lines of code is only about 10 pages and it can be shown as an “eyeful” on a desktop screen:

image

This is partially low hanging fruit since mathematics does underlie computer graphics at all levels. The kinds of ideas that APL first brought to light allows “runnable mathematics” to be possible (and when it is possible, it is as wonderful as it gets!)

What does Alan Kay think about inheritance in object-oriented programming?

Simula I didn’t have inheritance (paper ca 1966) and Simula 67 did (paper ca 1968 or so). I initially liked the idea — it could be useful — but soon realized that something that would be “mathematically binding” was really needed because the mechanism itself let too many semantically different things to be “done” (aka “kluged”) by the programmer. For example, there is no restriction of any kind to have a subclass resemble a superclass, be a refinement of a superclass, etc. All relies on the cleanliness of mind of programmers (and even the most clean of these often just do things they need when in the throes of debugging).

So when I designed Smalltalk-72 — and it was a lark for fun while thinking about Smalltalk-71 — I thought it would be fun to use its Lisp-like dynamics to do experiments with “differential programming” (meaning: various ways to accomplish “this is like that except”).

Also, Simula like Classes were perhaps not the best abstract “instance definer and producers”. They were very Platonic, and perhaps oversimple and overrestrictive. Plato has Socrates express “two principles” about defining Ideas (in the Phaedrus):

First, the gathering of scattered particulars into one Idea so that everyone understands what is being talked about;

Second, the separation of the Idea into parts by dividing it at the joints as nature directs, not breaking any part in half as a bad carver might

This is a pretty good early definition of modular design and perspectives on systems.

Later, Aristotle advocates a looser approach: to gather examples of an Idea rather than to try to define it (the set of the examples is used as the primary definition). This has some real benefits: for example, a particular example can be part of the definition of more than one Idea.

While pondering these ideas in the late 60s, I found that the “real AI” (I have to use this term today yikes!) people were doing the most interesting things in trying to represent meaning — their thinking was far beyond the programming language world. And they didn’t really care how slowly things ran (which was really a great help in thinking in those days!).

To pick one of my favorite examples, the work, and then the thesis (ca 1970), of Pat Winston at MIT — about “Learning Structural Descriptions From Examples” — was just really good. Pat used “semantic nets” for representations, but if you were thinking “objects” you could see that his approach was very strong. For example, he was the first (I think) to have full descriptions for “relators” — in today’s terms, this would be to have classes for polymorphic methods to (a) protect their generic meanings, and (b) to allow relations on them. So a relator like “to the left of” could be asked, “what is your inverse?” and it would reply “to the right of”.

I could see that this could be a wonderful thing for OOP programming, not just “OOP in AI”. For example, it provided one way to deal with search and priority conflicts if you had multiple superclasses (and any reasonable definition scheme with classes would need to deal with something like multiple superclasses).

Another interesting idea around this time was “relator inheritance” — that is, to have what an object means to actually be the collection of its parts as a system. Larry Tesler did some very interesting and compelling experiments in Smalltalk-72 using this idea.

The “real AI” folks at Parc were also doing a number of really good things — mostly in Lisp — but Ira Goldstein and Danny Bobrow did the PIE system in Smalltalk, and IMO greatly extended and added good ways to think about “objects” (especially “multiple perspectives”, which was perhaps the earliest set of designs for what we might call “traits” today — this was done very well I thought).

Because our goals at Parc were to invent as rich a version of “personal computing” (especially for children) as we were able — and not to invent OOP programming per se — we followed none of these, nor many other idea trails as they came up. But concentrated on user interface, simplicity of expression, “eternal execution”, safe and comprehensive interactive debugging (even of the debugger), etc.

This also came to a head when the plan for a new more powerful machine to replace the Alto in a timely fashion with something more powerful along several dimensions was not politically feasible with Xerox. This led to the various groups there making do with what we did have. Going larger and aster for us happened because of the brilliant design and implementation work of Dan Ingalls and several others in the group. This resulted in Smalltalk-76, which was almost 200 times faster than ST-72, and could manifest about a megabyte of objects. One fallback was to a single inheritance hierarchy, and most of the time we were able to control it well.

Another way to look at all this from a different point of view than from that of a programming language, is to think about what the dynamic system of intercommunicating objects might look like. Here the richness/simplicity of the messages that can be sent, the integrity and transparency of the modules, etc., are some of the most important considerations. Then one can go back and ask questions about how these are best programmed in ways that will help the limitations of programmers.

---

Endnotes:

0. I should have emphasized above — and elsewhere — that the idea of instancing is crucial for this kind of systems design. The reason I left classes out of “criteria for object-oriented” is mentioned above a little. For example, prototyping — as advocated by Harvey Lieberman and done in an interesting way in Self — has its uses and charms. I spent a few years arguing for “exemplars” that would provide more of a picture of what instances would look like (I was upset that the “document class” in Smalltalk was extremely bare — everything that was interesting about a document was added on the fly when a document was being made — it was hard to see the possibilities in the class description.)

I wrote a paper in the late 70s that looked at all of these mechanisms, and many others, as examples of views. That might have been overkill, but semantically, it made quite a bit of sense.

1. I felt somewhat the same way about inheritance as I did about types, in that both needed to be a lot better than they were in order to pay for the overheads and pitfalls of using them. Both of these are basically “control of meaning/semantics” problems. There have been efforts — e.g. in MIT’s “Flavors” and Eiffel — to pin down some of the degrees of freedom to get more meaningful expression.

2. Speaking of types, another interesting idea for Smalltalk-72 that never made it strongly into the language came from the idea of “gauges”, an entity that acted like a number in a variable but constrained its values. The idea was to put an instance of this into a variable and then never rebind the variable. The instance would contain the magnitude, do arithmetic, check value ranges, etc. It could rebind, but only within the constraints of the particular gauge. Etc.

This was possible in ST-72 because “<-” was just a token, and one had to do the equivalent of Lisp’s SETQ to rebind a variable (this would look like

‘a <- foo

in ST-72). So anytime you wanted “semantic types” you would install a gauge kind of entity that would do checking that went beyond simple class checks — for example, one could imagine a type for a function called “sine” that enforced the body to only do a close enough approximation to a sine function. Typing starts to get useful when it becomes more and more semantic.

3. Olof Salberger in a comment below points out that there has been interesting progress in dependent types, especially in LiquidHaskell and F-Star. This is all to the good. However, the add-ons start to diminish the readability of the code (one good reason for very high level languages is that the code can be both smaller, more expressive, and — with great care in design — more readable and gistable.

Since it is relatively easy to make an experimental language, I would suggest an approach to an eventual “really nice” statically typed language by first making a dynamic language that can use objects as “semantic type definers and guards” to be installed in variables — as explained in Note 2. above — and to make a number of sample systems with it to find what level of static type-safety really pays for itself, and to then do co-evolution passes to get the dynamic types in good enough semantic shape so that all (or most) of what they do dynamically can be done at code writing time.

The basic point of this comment is that in the end one wants to manifest a running system with as few errors of every kind as possible — and no fatal errors — some of which will have to be detected at run-time. A wonderfully type-safe scheme that resists writing (say) an OS or similarly complex systems is not really all that interesting and worthwhile in the end. Similarly, a scheme that allows all what is desired to be written, but only in a horrendous hodge-podge of code is also not interesting and worthwhile in the end.

We want “clean reality”!

Why did Alan Kay choose rectangular shaped Windowing as the initial style of GUI? What were the considerations and impracticality of differently shaped windowing approach?

The 4 answers already given are pretty good.

First, one of the earliest — if not the earliest — example of a windowed display with general clipping was that of Ivan Sutherland’s Sketchpad (ca 1962) which windowed a very large virtual display surface that could be continuously panned and zoomed in real-time controlled by some of the many sensor knobs on the machine below the display. This immediately brought up the problem of clipping lines that were not completely in the window — and arcs too because Ivan also used circular arcs as primitive elements.

image

The dynamic bridge that Ivan Sutherland is building in 1962 is on a larger virtual display than the physical display, and is therefore shown clipped

On the heels of Sketchpad was “Sketchpad III” which used much of the Sketchpad code and added 3D and multiple views of the object to provide the normal 4 views used in drafting (3D, plan, elevation, etc.).

image

In the mid-60s Ivan decided to invent “virtual reality” using a Perkin-Elmer heads-up-display originally made for helicopter pilots. This required the “virtual world” to be thought through — which resulted in the general notion of a kind of telescope or camera, one end of which — the window — was in the virtual world, and the other was on the display device — the viewport. (Note that that this terminology didn’t last because it was difficult not to think of the thing on the display as “a window”.)

image

How to do fast clipping on calligraphic “line-drawing” displays was worried about, because the displays after Sketchpad were told to draw a whole line (the Sketchpad display on TX-2 just drew points). What is called “the mid-point” algorithm was independently invented by Danny Cohen (with Ivan) and Ed Cheadle (with whom I was working on the Flex Machine, which also had multiple windowing).

So multiple clipping rectangular windows were around in the 60s before Parc.

The 3D work — mostly at Utah, but in a few other places — had shifted to trying to generate “continuous tone” images rather than wire frames, and this resulted in searches for efficient ways to determine what color each pixel should have.

This was not done in real-time back then for a variety of reasons.

When I thought about the Dynabook in 1968, I posited a single bit per pixel using an LCD screen (like those starting to be invented by George Heilmier at RCA and Peter Brody at Westinghouse).

I could calculate that one needed about 1 million pixels for being able to do most things somewhat reasonably. This was about 128K bytes — which was much more than I was hoping for the whole machine! — so I also posited that the LCD would have some kind of memory built in, but would still be fast enough to do 10–12 frames of animation.

This is still not much of a display with reasonable resolution of about 100 pixels per inch, especially with regard to visual angle.

It occurred to me that windows on a pixelated display could be easily made to seem overlapped, and that refreshing a window that seemed to be occluded would seem to bring it up to the surface — and that this would be really great on such a small display because it would effective magnify the display surface available without introducing modes.

A few years later at Parc we got to do an “Interim Dynabook” (the Alto, mostly designed by Chuck Thacker), and in 1972–73 we could only afford 128K bytes for the whole machine, but wanted to do a pixelated display not just for the future flat screens but because even with 1-bit per pixel you wind up with a general display — like paper — on which you can display anything.

So we sucked it up and decided to have the Alto display be about 1/2 the million pixels needed: 808 x 606, and using about half the available memory. (Many tricks were used via the Alto’s display list to not have to use all of them all the time!)

Two of the many kinds of graphical objects introduced in Smalltalk were “run-length bit-maps” and “two-bit bit-maps”. Both of these allowed arbitrary shapes to be composited on the screen (i.e. “images” for desk-top publishing are just windows with no boundaries showing, and with the effect of transparent surrounds).

Meanwhile, back at the ranch … (I don’t have room here to say much about “display languages” so will just point …)

We were all of a mathematical turn of mind, and earlier — while many of us were at Utah in the midst of many kinds of graphics that were fomented by Ivan bringing his grad students from Harvard out to “Happy Valley”, and this because John Warnock at Utah had invented the first really efficient way to do CT graphics — the desire to mathematise graphics was intense. The VR HMD used a home-built matrix multiplier, and — so sweetly — the matrices composed, so you could turn a whole chain of transformations into a single matrix if the math had been thought through and prepped.

At Utah and then Evans & Sutherland, and then Parc, and then Adobe, John Warnock and a few others “refined the math” with a few intermediate languages to eventually produce Postscript. Among many good things, PS also introduced the ideas of masking and extruding, which provide a kind of generalized clipping.

If Postscript could be run fast enough, it would be a great replacement for “bit-blt”. A first try at this was on the NeWS UI for the Andrew system at CMU, which just barely worked, but was the right idea. The Nile system by Dan Amelang is a recent beautiful and comprehensive way to produce graphical images and their compositions.

If I want to improve my piano playing capabilities, are there any mathematical patterns I should be aware of?

John von Neumann said that mathematics is about “relationships about relationships”, and there are lots of these in most music: some of them are interesting, some even profound.

But I think there are important parallels between poetry and music with regard to form. One is that a formal nicety is not enough by itself to cover for lack of an content idea. For example, there are fugues of great complexity that are empty musically — Fux was famous for writing some of them. And there are various patterns that can provide spice to a composition: 2 against 3, 3 against 4, hemiolas (displaced strong beats e.g. playing C E G repeatedly in 4/4 time), etc. There are a number of “unplayable” arpeggios e.g. in Brahms where the composer was trying to indicate an effect to be pulled off rather than a call for a literal rendition of the score.

In other words, what is called “prosody” in poetry, theatrics, reading aloud to others, etc. is the center of playing any music. This is often easier in musical cultures that don’t use detailed written notations, because the music — as with the poetry — is conveyed “orally” with the poetry/prosody intact. Learning to play music by looking at a score on an instrument with limited expression (like a piano) can result in mechanical playing and wind up “losing the music”.

On the other hand, as with learning to read and write natural language, a fluent literacy in musical reading and writing is tremendously useful, especially on the piano with its vast literature of 10s of 1000s of pieces.

It is common in theater to have “readings” where actors gather together to read aloud a play to each other. This requires the actors to understand the overall play, who are their characters, and what they are trying to do, both in the large and the small. Prosody is almost forced because no one wants to sit through several hours of droning monotones!

The need for wind on wind instruments and the need for bowing on the strings, helps musical learners realize that the notes are not to be played as written, but as some kind of “speech” (sometimes really like speech in some baroque pieces), and especially as some kind of singing (most of the time). Notes are not in isolation but are usually in groups, with arcs and hesitations and perhaps little flurries (even while feeling an underlying pulse — which is not the same as the completely regular ticks of a metronome). There’s “buoyancy” to it all.

Another aspect of “buoyancy” in music is called “groove” in jazz. This is different from “being in the pocket” and “tight” but there are overlaps. Each of these are very important in classical music also. “Tight” means playing with another player where you start your notes absolutely together (and usually finish absolutely together). “Groove” is hard to explain because it is a kind of feeling of “oneness” and “rightness” (a single player can have “groove”). “In the pocket” is an overlapping idea of both “being tight with another instrument” and “being right with another player”.

Great classical players have groove.

As the composer Paul Hindemith pointed out, a piano can be played with a rake! There is no wind needed that requires breathing, no bow that requires changing. No pitches that have to be controlled. And there are lots of fingers doing lots of things somewhat independently.

And there is the written music, which does not indicate very much about how the piece should be played. And there are exercises, which usually have very little musical content. All of this contributes to “mechanical tendencies” in the learner’s playing.

What are the remedies? One way to think about this is that you are encouraging multiple parts of your brain/mind to learn different things as you work and play on musical instruments. Over all is the larger sense of the piece — and this does involve the prosody and expression — but there are lots of technical things that do have to be impressing on the parts of your brain/mind that can do complicated things but can’t think about them well.

So: IMO singing is the very best thing any musician can do whilst learning. And out loud at first until you can internalize real singing inside. I also like to listen to musicians play pieces to get their gists — YouTube is just wonderful for this — and also to compare with the score to see how the scores only “hint” — like the script of a play — the art itself is created by the combination of a composer/writer and a player. This will help build one’s own conception of the piece at hand.

I like to do the technical stuff in two parts (a) within the context of music (find tough stuff in a great piece of music and work on it), and (b) as repetitive exercises where the only thing is to get more even at various tempos (this is one of the places where a metronome actually helps).

Finally, it’s worth noting a few comments from Francois Couperin, one of the great keyboard composers from 300 years ago, in his book “The Art Of Playing The Harpsichord”. One was that the notated music was not to be taken at all literally, but just to be used as a guide. Another was the suggestion that children be kept away from the harpsichord for early study, but to put them on a clavichord, which is much more expressive. He also suggested that children be initially taught to play a number of pieces without using music, and to concentrated on getting them to sound musical. Then they can be shown what these pieces look like in score, etc. (This is rather like having language learning precede learning to read, but not by a lot.)

As I look back on my learning to play keyboards, I would have done better if I’d stayed with pieces that I could “play the notes” on, and really worked further to get them to be much more musically expressive …

How did you lose the bad habits you gained from self-learning a musical instrument?

See the great answers by Ben Simmons and Ed Durbrow.

After playing many jazz gigs around NYC and Long Island, I wound up in the Air Force in Texas in the early 60s. I was not in a band there, but got myself a wonderful Gibson Johnny Smith guitar. I decided to take advantage of not playing gigs to completely retool my left hand technique, which had evolved to only use the pinkie for chords. For passage work, I only used the first three left hand fingers. This was definitely holding back further progress.

This required about a year of doing about an hour of work every day, because it was not just scales etc., but I had to retool my brain for using the pinky automatically for improvisation. After Texas and the Air Force, when I started playing again, I found that much more work was needed for “playing live under the gun” to not slip backwards and just “stay on it”.

All in all, perhaps two years to get really comfortable again. Years later I read that it takes about two years to really ingrain a habit, and about two years to really undo a ha

Among the best guitarists in the world, why aren't there any women?

There are lots of well known female “best guitarists in the world” in the classical guitar world, including with truly incredible chops of the kind valued in the popular and rock world.

But classical guitar is a very extensive art and passion — it is lots lots more than just chops, etc. — and it includes deeply contrapuntal music as well, which is quite challenging for guitar given the way it is laid out and played. I could choose from many hundreds of examples, but I was just sorting through arrangements of Bach’s famous “Jesu, Joy Of Man’s Desiring”, traditionally played at Christmas time.

The piece is a decorated chorale with an instantly recognizable accompanying theme in triplets against a slower chorale melody in 3/4 time, and was originally part of a cantata for voices and orchestra. It is been arranged for many combinations of instruments, and many times for guitar.

The arrangement I like the most is David Russell’s, in part because — unlike most of the other arrangements — he successfully combines the two themes really beautifully (and ingeniously!) in a number of places just as in Bach’s original arrangement for choir, strings and winds.

This results in a piece that is extremely difficult to pull off smoothly, and make it sound buoyant and natural with no sense of strain. One of the problems is that the slower chorale melody in the upper voice really can’t always be played legato given the demands of fingering the triplets melody and the bass (but it still has to have “line” and to sound connected so that the actual disconnects don’t seem to happen …)

Even David Russell — one of the great virtuosos of our time — has a little bit of a struggle with his own arrangement. But how about listening to Kaori Miraji, playing the very same arrangement wonderfully and getting the sense of the full sound of the larger ensemble?

https://www.youtube.com/watch?v=jCF89FhcjeI

This is like magic (especially for any guitar player to hear and ponder)!

How can I improve my piano playing and learn to play faster sections? I can play them slower than proper temp but cannot seem to increase speed to 'tempo'.

Here’s where a little more info from you about where you are would be helpful.

Metronomes and gradual increases are generally helpful at any stage.

But there’s another aspect to fast playing, and that is the mental transfer from thinking about each note to thinking about “necklaces of notes”. This is because fast music can have more notes per second than the thinking part of our brain can deal with, but we have other parts of our brain that can execute very fast sequences as “single things”. Thinking about each note creates a barrier.

One of the best ways to add this “chunked” kind of playing is to pick a metronome speed that you can play at, and periodically just double the speed of a bar — no matter what happens. This is called “burst mode”.

The basic idea is to start training the parts of your brain that are going to carry out the “string of pearls” passages. Doing slow practice a fair amount of the time is important because you don’t want to learn a bad version of the fast stuff.

I found that playing two handed scales in octaves over four octaves with different metronome tempos — starting pretty slow — but sticking in burst mode for one or two of the four octaves, gradually got me to “release” from worrying about each note while still paying attention to pulse and connection …

I don’t know a good book about keyboard technique that would help (it’s been many decades since I studied keyboards), but a book that is popular with classical guitarists is “Pumping Nylon” by Scott Tennant (an excellent pro who teaches at USC).

To what extent was it possible to build a digital computer during ancient Rome?

This is a very interesting question — right up there with one that I used to put to a class on computer design: “What computer would Captain Nemo have designed for the Nautilus? — Sketch out the design for this!”

The current question can be taken in a number of ways: including the meaning “With the technologies available during that time could a complete genius — beyond Leonardo — have figured out a device that we would call a digital computer?”

This way of posing the question also allows for “Greek thought” to be employed, since the Romans had many Greeks — free and slaves — who did some of the deeper thinking for that practical minded people. This is a good combination because the Greeks didn’t care much for Engineering, and the Romans did — many of the most interesting mechanical inventions of the Greeks were toys, especially by the Alexandrian Greeks.

Let’s leave aside that one of the problems of slavery is that it removes one large motivation for inventing mechanisms to do much general work, including calculations. And let’s leave aside the question of “build-up” of precursor ideas, which even geniuses need and use — the question is about “can something be built?” at a given time in history.

Some of the other answers have correctly noted that both the Greeks and the Romans calculated using a kind of abacus (the word “calculus” is the term for the stones moved around on their counting tables). It’s important to note in passing that abacii do have “zero” implicit in how they work.

image

Reproduction of a Roman “pocket” abacus that would fit in a modern day shirt pocket. The “calculi” here are not stones but beads in sliding slots.

As a computer guy I was naturally quite interested in Babbage (and Ada), and was initially very impressed with the inventions that he had to do as precursors (like careful mechanical drawing, the turret lathe, etc.) and the inspiration of the Jacquard Loom. It was fun to get to crank the reproduction of the Difference Engine at the London Science Museum (super high precision construction, but a lot of a backlash and effort).

A few years later I had occasion to really look at the earlier Jacquard Loom, and was completely knocked out by its astoundingly superior approach to mechanical design; even the huge mechanical action pipe organs of the time were not as wonderfully done!

I realized that Babbage had taken a very poor engineering approach — he was a mathematician, and tried to do things “exactly” — and really should have looked at the actual mechanisms of the Jacquard Loom much more closely.

image

Only the top 1/3 is the Jacquard mechanism. The whole loom is mostly wood with a little mechanism (including a lot of logic made from “coat hangers”).

Jacquard went completely in the opposite direction: everything is just flopping around except and only except when he needed precision (a lot of the precision he got was via “guide holes” that would constrain the floppy things but let them flop on either side (one way to look at this is that the guides were to “reduce errors” periodically in the chains of causes and effects).

The result is that there is virtually no propagation and multiplication of error (in Babbage’s scheme you get enormous propagation of error). One of the many fun things about Jacquard’s approach is that the mechanism does not require inventing a new kind of loom: the Jacquard apparatus sits on top of an existing loom and simply pulls the threads according to the program on the punched card chain.

So one person operating a Jacquard Loom could affect thousands of threads via the punched card programs with human power alone.

What’s interesting here, is that there is nothing tricky in the Jacquard Loom mechanically — large parts of the logic can be built from coat hangers! — and it could have been made using just the technologies available on either side of 0 BCE. Here is a short precis of some of the technologies available.

The Wikipedia article on the Jacquard Loom is a start, but lacks enough detail. I’m still looking for the best description of the Jacquard Loom mechanism online (please help!). What’s important here are the actual details of how Jacquard saw to make selection not require careful precision except only where actually needed. For now try this pdf starting at page 5. And try this YouTube video to see some of the different parts at work driven by one person.

https://www.youtube.com/watch?v=OlJns3fPItE

There are several books with useful details: Essinger, James (2004). Jacquard's web. Oxford University Press, Oxford, and especially: Bradbury, Fred (1912). Jacquard Mechanisms and Harness Mounting. John Heywood Ltd., Technical Book Depot, Halifax, Yorks. The best book I found was: (1888) The Jacquard Machine Analyzed and Explained, by E. A. Posselt. Thanks to the wonderful Brewster Kahle and his Archive.org, this book can be found and downloaded via Google

I think it would be quite possible to use modern thinking to design a programmable computer based strongly on the Jacquard mechanisms, and then to build it using only techniques, tools, and materials available ca 0 BCE.

I will devote the next week to trying to find an understandable online account of how the Jacquard Loom actually works (it is almost always the case that museums — even “science and technology” museums — just exhibit artifacts, and do not provide explanations or demonstrations of how and why they work).

Addition I

I had forgotten that I’d written about Babbage for Quora, and mentioned there more detail about the Jacquard Loom. What’s the best book about Charles Babbage in terms of technical detail?

I found a pretty good YouTube video from the V & A Museum that includes a 3D animated explanation of the Jacquard invention.

https://youtu.be/K6NgMNvK52A

It seems to not have a sound track, but I think the whole visual explanation is pretty clear. You might want to run the animation at 1/2 speed to really be able to follow the cause and effect relationships.

Addition II

Marcel Levy asked a question that is worth putting in the main body here: “So you’re saying it was not the practical side that was lacking but the theoretical one ?”

I think: Yes. It’s worth looking at some of the mechanical inventions of the Alexandrian Greeks, including: the pipe organ played with keys with wind stabilized by water pressure, etc. Also Greek theater mechanics, etc. And of course, the Antikythera astronomical calculator.

Heron of Alexandria was the most interesting inventor of mechanisms ca 0 BCE, besides the pipe organ, there was reported an odometer that counted digitally, and most interestingly a programmable cart (a recreation here on YouTube).

https://www.youtube.com/watch?v=xyQIo9iS_z0

What else did Heron do along the lines of these “toys to amuse and amaze”? He had the idea of “programming”, and probably devised other forms of it for some of the automata he came up with.

Looms had existed for thousands of years, and elaborate ones for hundreds.

The Greeks (and the Romans following) realized that the cultural “official” way to write numbers was “inconvenient”: we can see their real thinking by looking at their calculating machines (the various abacii).

We could definitely ask questions about “what did they really want to compute that was outside the scope of their abacii?” Babbage was driven initially by inaccuracies in mathematical tables (“I wish to God these calculations had been executed by steam!”) Turing was intially driven by Goedel’s results. Physical computing was driven by needs of warfare from WWII onward.

They most definitely had the brains and much of the mathematical and physical outlook. It looks to me that they lacked the feeling of need more than anything else that would drive them to use what they knew to see how to make a machine to compute for them.

What are the most complicated concepts in computer science?

I’m not sure what “computer science” means to the requester. Almost 15 years ago I had occasion to give talks at a number of universities, and at each one asked the audience of students and professors for their definition of the term. Overwhelmingly, the definitions would be called “engineering ones”, and just what “science” is actually about never surfaced. When I conduct an occasional class at UCLA, I’ve gotten the same response.

So, one answer to this question could be “Computer Science itself”.

However, what jumped into my mind was a substitute question: “What do you see the least understanding of in the ‘not-quite-a-field’ of computing?”

The three things that hit me in the face — via 56 years of observations — are partly related I think: (a) the hold-over from the 50s of an intense focus on “algorithms and data structures” (as opposed to processes and systems), (b) the almost absence of “informed design” and “feel for scaling” (as opposed to taking the larger situations into account), and (c) the lack of “trying to help every kind of user learn more and become more empowered in computing itself”.

None of these are fair as stated, because a few people do make careful systems composed of processes, do careful design for integrity and scaling, etc., and do care about lifting the society as well as just giving them servants.

Still, I think it’s worth pondering just what could be thought about and done by taking a much a larger look at what computing is actually about, and what today’s world and societies actually need.

I think this would illuminate many areas of (a) what didn’t progress when it needed to as Moore’s Law exploded and (b) what didn’t happen in the understanding of how to make new media a positive force in human life.

In a chord sheet, what does the 2 chords with slash like Am/F# mean? Playing them together on a piano makes a dissonance.

It’s worth thinking about this chord a bit. As the other answers indicate, the slash followed by a name means play that note in the bass.

If you play the chord with normal Am spelling — A, C, E, F# — you can see that the F# is the “6th” position in the chord, and this would be called Am6. A nice sound!

The inversions of this chord are: first inversion: Am/C, second inversion: Am/E, third inversion: Am/F#.

But it is also worth looking at this chord as though F# is the root. In normal spelling this would be F#, A, C, E.

So: almost an F#minor 7th (F#m7) if the C were a C#. So we can call this chord an F#m7–5 (or F#m7b5). You will often see this chord in jazz charts. If you play this in this spelling near middle C it will not sound at all dissonant, but vaguely mournful or ethereal.

Now notice that the first 3 notes are all minor 3rds apart — F#, A, C — so it would be a diminished chord except for the major 3rd between C and E. In honor of all the minor 3rds, this chord is also called a “half-diminished” chord sometimes notated F#ɸ — in other words a diminished ° with a slash through it.

It is worth playing with this chord to find the places “where it wants to go” (there are lots of possibilities). Then you could look up “half-diminished” in Wikipedia for a more in depth discussion.

One way to think about this is to ponder “where does the bass note ‘want’ to go?”, first looking at half steps and then whole steps. What if the F# goes to G? Does that make any sense? (what does it sound like?).

One way to think about this is that the F#, A, C, E, is kind of a D9 (a V chord of G) without the D. So this could be a way to get to either G major or G minor. Can you think of other chords it could resolve to with G in the bass?

Going the other direction, to F from F#, we could expand the E up to F, so the resolution would be to F major. We could also lead to a Bb/F or a Bb7/F, etc.

As with other ambiguous chords, you can try other roots to see if they supply more perspective. If you use C as the root, you get: C E F# A. This could be thought of as C6–5 (you don’t see this very often, but you will see its cousin C7–5 (C E F# Bb) everywhere because of its chameleon properties (of being two symmetric chords in two different keys at the same time) — can you puzzle this out?

What was the first computer you worked with, and in what language did you program it?

I answered part of this question in a related question: When you were learning assembly language, what assembly language did you study?

My first real computer was the IBM 1401 in 1962 in the Air Force. It was a “character-based computer” with 6-bit characters (short bytes) accompanied by a 7th bit, which acted as a “word-mark” (an 8th bit was used for parity checking). IBM 1401 - Wikipedia

This allowed variable length “fields” to be defined just by setting a word mark at the left end of a sequence of characters, and the field would be addressed by the address of the right end of the sequence.

The encoding of the 6-bits for data was “BCD” which mapped a column of a punched card into B A 8 4 2 1. This allowed the fields on a card to be mapped into memory using word marks (the cards would initially read into a set location in memory.

image

The top two rows were mapped into the B A bits. The “numeric rows” were mapped as one number into the 8 4 2 1 bits.

The word-mark bit scheme e.g. allowed two different sized numbers to be added together a character at a time by the ADD instruction with the two addresses of the low order characters of the fields. The first word mark encountered would stop the operation (and one can imagine the need to make sure the receiving field would be the right (longer) size.

The word marks were also used to allow variable sized instruction formats that could make use of a computation “so far”. For example, an “ADD” instruction could be in a two address, or one address, or no address format. Clever use here could allow very tiny programs to be written that did a lot.

And from the systems point of view, the word-marks allowed a lot of extreme cleverness to be used on these very small machines (they went down to 1400 character memory size). Memory management was a large part of writing an effective program for the 1401.

At the Air Force Air Training Command computer center we had an 8K character machine (which was regarded as a luxury by 1401 programmers in smaller shops). We found that it was possible to write a kind of operating system and “overlord” that would fit just barely into the top 1K characters …

Systems programming on the 1401 was also aided by its first class tape drives, which were both very fast and had extremely fast rewinding speeds. We had 6 drives, and would often use one as a swapping (overlay) memory device.

To what extent did the Xerox PARC Alto inspire the Apple Macintosh?

I wish it had beyond gross appearances. The bitmap display and mouse pointing device were the most obvious inspirations.

image

The Alto was intended to amplify a wide range of users.

Some of the software on Alto — such as Smalltalk with its overlapping window GUI — were directly responsible for the overall look and feel of the Mac (and before that, the Lisa).

image

Photo of Alto screen showing Smaltalk-74 right before Smalltalk-76

The Alto — Chuck Thacker main designer — which first appeared in 1973, was a “parametric machine” of 16 virtual processors with 0-overhead-interleaving driven by microcode using an engine that was 5–6 times faster than main memory. The minimal display on the Alto was 808x606 (about 490,000 pixels).

image

The overall architecture of the Alto, with format of a microcode word below

11 years later, the Mac was a rather underpowered Motorola 68000, whose CPU was along the lines of a mini-computer (like a PDP-11 but not a Data General Nova). The Mac display was 512 by 342 (about 175K pixels, a little more than just 1/3 the pixels on the Alto 11 years earlier).

However, it is worth looking at the earlier Lisa, which was more of what people needed, but was about the price of an average US car (to me, a bargain for one’s information vehicle! but not to most consumers and businesses).

One of the significant design strengths of the Alto was that the reloadable microcode allowed customized efficient virtual machines for very high level languages to be built, and be modified to introduce optimizations as they were discovered to be needed.

The microcode also allowed new real-time display and sound primitives (such as bitblt, the universal screen painting primitive, 2.5D half-tone animation, and both 12 voice polytimbral real-time sampling and 8 voice polytimbral FM synthesis to be done several years after the Alto had been designed).

And most of what is usually HW in all the peripheral controllers was simulated by the microcode (including controllers for the display, disk, keyboards, mouse, handset, sound, Ethernet, and up to 1000 input lines). This both saved a lot of money to make these machines, and also provided much more flexibility.

Software is a very good idea! (But it also needs to be taken deeply into the “hardware”!)

Taking the 11 year difference into account, the various computers designed and built at Parc in the early 80s still had a microcode architecture but were now able to use several of the AMD “bit-slice” CPU components. This would have been a great choice for Apple to take up — it would have revolutionized personal computing — but most of the Apple technical people were very uncomfortable with bases that weren’t like mini-computers.

Both Intel and Motorola could have made chips that were more “parametric” with regard to their function. This never happened despite the fact that most versions of the chips they did make had various kinds of fixed immutable “micro” and “nano” coding internally.

Apple routinely blew chance after chance to have a dual strategy for the future by rejecting a number of opportunities to also have a line of workstations, which by Moore’s Law would be Apple’s future Macs 5–7 years out.

Is there static typing for a Smalltalk derivative?

Besides looking at the past, we should look to the present and the future.

The basic idea of “object” is to invest a “full virtual computer” so it can provide useful services via messages in a machine independent way and with enough encapsulation to be safe against attack.

This also means that everything can and should be an object because (a) a full virtual computer can simulate any other kind of thing, and (b) because this also allows objects to be self-defining, self-documenting, etc.

This means objects should be transportable across machines and requires the inter-object messaging to be done well (i.e. messages are objects, an object got to another machine by being sent as a message, etc.).

An object can contain everything it needs above the simplest VM substrate. But what if we want our object to be able to deal with external resources? From inside the object we could decide that a particular external resource will have the local name of “foo”. We expect that “foo” will be bound to “something” that is exterior, but to which we can exchange messages, and both use — and perhaps provide — services.

Now let us use the fact that we are 45 years removed from Smalltalk, and now have the large universal network that my ARPA/Parc research community was also working on. We first note that if we have a good idea of the receiver, then we can definitely send a message anywhere on the Internet, and this can work just fine. We will have to program in much less of a subroutine style than Smalltalkers did in the 70s (and probably still do now). In other words, we really are sending messages, and the replies might take quite a while, might not happen, etc. This is OK — the ARPA/Parc community was a “full-duplex” community and all of its network inventions were full-duplex.

There’s a lot to be said for full-duplex programming. One set of examples I like are from Erlang and its offspring. Using something like LINDA as a messaging “broker” makes sense, etc.

On such a world-wide network, we might not know the local names of useful services and resources, and they might not know enough about our particular object to help much.

I think this is a good place to ponder the kinds of “modern semantic typing” that are needed to deal with the enormous scales that happened, since we were able to do a pretty good account and demo of “personal computing” in about 10,000 lines of code in Smalltalk-78 on a Dorado.

https://youtu.be/AnrlSqtpOkw?t=135

Let me start the discussion with readers, here, with three points.

• First we need to find the resources that would help us, and we don’t know their names. However, when we install “foo” we could associate with it the best set of stuff we can come up with to assist a local search engine to find that resource.

One starting place could be with something like LINDA++ where we are matching descriptions using a more universal vocabulary. But I think we are beyond that point already.

I think we should at least start with ideas such as those Doug Lenat had for AM (Automated Mathematician) and Eurisko (both of these systems were active and had lots of heuristic methods to help decide that “this thing is like that thing”).

One thing that would be interesting to associate with “foo” is a simulation of the resource we are looking for. This is something that would be more semantic than pragmatic, but which could eventually produce typical replies to messages.

We can now imagine that to bind to a resource, our object (and object system) will be able to negotiate actively with the environment of the object to get interior views of exterior resources.

• Secondly we should imagine that when we are sending messages under the conditions of possible enormous scales that every message is not an encapsulated “telegram”, but has the potential of being a full-fledged ambassador, so that it can likewise take part in real negotiation of meaning at the receiving end.

• Third we could imagine an extension of a “publish and subscribe” scheme as the general way to build systems. The basic idea here is a bit more biological and stochastic. The matching and negotiation processes would be used so that each object has two public billboards, one for “requests for resources, help, etc.” and the other “offers to the general good”. The semantic matchmaker will make useful loose couplings, and very large, very robust systems can be made.

The only system I’ve seen over the last some years that “feels” like this will seem to have little to do with what I’ve been discussing.

General Game Playing of Mike Genesereth at Stanford.

But consider all the ramifications of what it means to write a program that is able to look at “requirements for a possibly very complex ‘game’ ”, and to generate a process that can then synthesize a game playing agent from those requirements.

In short, I like “types” if they can really pay their way by dealing with the semantics that really count!

Which of your guitars would you keep if you had to give them all up, but one, for the rest of your life? Let’s assume you own multiple guitars (acoustic, electric, nylon, steel, 12-string, etc.)

Like J Robb Wilson, I also have a 1962 Gibson Johnny Smith that I got in 1962 while in the Air Force, and played it on jobs for many years. I kept it in mint condition. Hard to imagine parting with such a source of memories.

image

Gibson Johnny Smith (not mine, but a spitting image from Google images)

A really wonderful guitar in so many ways (basically Gibson’s successful attempt to get Johnny Smith to endorse Gibson via making a pretty close approximation to Smith’s D’Angelico New Yorker that Smith was willing to play in public).

However, I think my Roger Borys B-120 is a better jazz guitar all around.

image

Roger Borys B-120 (not mine, but identical style from Borys’ website)

I got it about 10 years ago when I started playing guitar again, and noticed that several of my most admired guitarists around LA — e.g. Larry Koonse, Tom Rizzo — played Boryses. A perfect jazz “tenor jazz sound” created by the combination of special laminated sound board and custom designed pickup. Spiritually, I think it could be traced back to the classic Gibson E-175, Jimmy D’Aquisto, etc.

As with most guitarists, I have a lot of guitars, including a number of really good classical instruments — a 2006 Jose Ramirez 1a, a 1950 Miguel Rodriguez, a couple of Kenny Hills, etc. There are two Kirk Sand “crossover” nylon string instruments that are superb. A Martin kissing cousin to “Trigger”. I’ve got two very good Telecasters (one especially good). I have an early original Steinberger “trans-trem”, an almost perfect electric guitar in almost all respects. Etc.

But, about 5 years ago a variety of causes — including age, arthritis, etc — left me with a left thumb joint with nothing good between the bones of the base joint. It was not repairable and very painful and swollen, so I had to quit.

Then a few months ago I was inspired by a great lute player — Thomas Dunford — to see what a lute was like to play, and found it was easier on the left thumb.

One thing led to another to early 19th century guitars with short scales, small necks, lower tension strings, etc. I found that playing these with the fingertips like the lute — without nails — worked really well, and the shorter scale especially made all the barres much easier to do with very little strength.

And this led me to Michael Thames of Santa Fe, one of the premier classical guitar makers in the US, and who also likes to make 19th century guitars, including some of classic early Torres, and several of the earlier in the century instruments. I got to try one of his instruments and found the action to be amazing — this is related to him being a very good guitarist himself.

This started a project for him to build me a “Mirecourt” style instrument ca 1820 with a short scale (620mm), narrow neck (48mm), and a few other ideas including two extra outrigger bass strings (which often could be found on these instruments back then). You tune them to notes “that will help”.

image

Michael Thames “Mirecourt”

I’ve now had this guitar for a little over a month, and I have to say that it is really the best guitar I’ve ever had the chance to play. Just amazing in both feel and tone. Just being able to play again after thinking I never would is a big deal that only musicians can really understand.

This would be the one I wouldn’t give up for anything!

---

2020 Update — Except … that my left thumb continued to get worse, and I wound up not being able to play “Mira”. But as this was happening, I accidentally attended a wonderful concert by the young lutenist Thomas Dunford, and was transported. I asked the local lute society about left thumb pressure, and they said much less. I have been happily learning it for about a year now. The thumb pressure is much less — I really have to watch it — but I’m able to play. The lute is much less like a guitar to play than I thought it would be. The increased possibilities for expression and the low gut frets and doubled strings require a lot more careful muscular learning, so my progress has been slower than I first thought. But it’s quite wonderful.

After trying and buying quite a few lutes, I was very lucky at the end of 2019 to have a truly amazing Renaissance style instrument come my way, made by perhaps the greatest living lute maker: Paul Thomson.

image

I’m very happy to be able to play this (and am paying a lot of attention to just how the left thumb feels in the hope of prolonging the joy of playing).

Did Xerox PARC write human interface guidelines?

Yes.

In the early days, we did many little test systems and many dozens of experiments, first on a special high-bandwidth and resolution “video generator” with a memory that could be used to store bit-map images (including for text characters). This experience was fed into the design of the Parc Alto, on which most of the subsequent experiments were done.

Also during that period, William Newman and Bob Sproull (both at Parc) finished and published (in 1973) their book on “Principles of Interactive Computer Graphics”, which contained pretty much everything known up to that point, including examples of most UI techniques (this is worth perusing to get a snapshot of that time). It didn’t contain the bit-map graphics that was being developed using the Parc Alto.

A few years later, there were a number of examples of “interesting” UIs, including the Smalltalk system, “Markup”, etc.

The two most interesting examples of “principled” human interface guidelines that were written up as design documents post the mid-70s were those for “OfficeTalk” — a terrific design for a hard problem, and very carefully thought through. Here’s a description of this very interesting system by Skip Ellis and Gary Nutt. Others deeply involved in OfficeTalk were William Newman, Tim Mott, Dan Swinehart, Jeff Rulifson, et al.

And a little later the UI guidelines for the Star computer design were composed (which were also very carefully thought through).

I don’t have the internal design document at hand, so I’m a bit fuzzy on all the contributors, but they included: David Canfield Smith, William Newman, Larry Tesler, Bill Verplank, Charles Irby, et al. The retrospective document is here.

What was the Whole Earth Catalog?

It was a series of wonderful collections of “access to tools” especially including “tools for thought” of hundreds of books worth reading (the first 500 books to go in the Xerox PARC library were all the books in the Whole Earth Catalog). Much of its approach was systems-based, and this gives it a very future oriented point of view.

image

There are also hundreds of physical tools of every kind, many for self-sufficiency, housing, farming, making, etc.

image

It first appeared in the late 60s, continued through the early 70s, and had a very large positive effect on the thinking of both “the 60s” and of the ARPA/Parc research community that was in the process of inventing much of the interactive computing and networking technologies that we still use today.

It was primarily the inspired brain-child of Stewart Brand.

The high point was “The Last Whole Earth Catalog” of June, 1971, which won the National Book Award. (Of course there were further editions …)

I like to have one of these around in both Los Angeles and in London …

Almost all of these books are worth reading today, as is the Whole Earth Catalog itself. (It would be nice if the Wikepedia people had studied it and learned from it.)

But the Wikipedia article on it is well worth reading: Whole Earth Catalog - Wikipedia

And most of the editions of the Whole Earth Catalog are scanned online with pdf versions, so you can download and read them. Back Issues - Whole Earth Catalog

It is worth perusing the entire Whole Earth Catalog website.

Added July 2021 — this website now seems to be down or missing — but thanks to Frank Heile (in comments below), here is a link to the pdfs

http://charlesbroskoski.com/_/view.php?id=whole-earth-catalog

What are the successors / continuations of the "talking typewriter" project/idea referred to by Alan Kay?

I was hoping someone would answer this so I could find out!

The question references a paper I wrote in 1972. About 5 years ago I was asked to write an essay on “The Future Of Reading” in which I referenced both the original “talking typewriter” for children by O.K. Moore (a terrific thinker), and a follow-on 20 years later in the 1980s done by John Henry Martin with IBM.

I also discussed some of the things that could be done now and in the future, but I wasn’t aware of contemporary work that was actually making systems that could be used by children.

Any one out there aware of next steps with children learning how to read by learning how to write with a “responsive writing and reading machine”?

What did ARPA-IPTO / PARC learn about how to do good problem-finding?

Interesting Quora glitch (this question didn’t automatically get sent to me — I just found it randomly).

JCR Licklider (“Lick”) had a nose for large visions (you can experience his early reaction to interactive computing in his 1960 paper “Man-Computer Symbiosis”). This plus the fact that he was well regarded by a number of important people in the DoD, wound up with them — in 1962 — giving him quite a bit of funding by computer standards to set up IPTO (the “Information Processing Techniques Office”) within ARPA (what we just call “ARPA” today with regard to computing research).

When asked what he was funding, he would just articulate the one sentence version of “the vision”: The destiny of computers is to become interactive intellectual amplifiers for all of humanity pervasively networked world-wide.

One of his ideas was to not have the ARPA-IPTO office come up with “goals” etc. (he used to say: you can’t have good ideas within the Beltway (still seems to be true!)). Instead, he would try to find and fund “great people” who would go to the next levels of “problem-finding” and “goal-setting”.

For example, by 1962, Doug Engelbart had spent some years putting together a 144 page overview and plan “Augmenting Human Intellect: A Conceptual Framework” that still today is quite an amazing collection of important ideas, perspectives, things that need to be done, etc.* Engelbart was originally funded by the Air Force, then NASA (by Bob Taylor who went on to be an ARPA funder and who later set up Xerox Parc). But Doug got his main funding from ARPA-IPTO.

Engelbart was a really good “problem-finder”. One can learn a lot by really understanding what was shown 6 years later in “The Mother Of All Demos” (MOAD) in 1968 (just celebrated the 50th anniversary of this on Dec 9th). This is critical because one will note — as Bret Victor pointed out in a reflection about Doug in 2013 — that both the intent and much of the reality of the demo was not really a precursor of today — much of it went considerably beyond the watered down stuff available today.

But then by reading the above document, one can see just how much more he — and they — were thinking about that didn’t get implemented and shown in MOAD.

Let me just offer a few properties of how the ARPA/Parc research community did “problem finding”.

the funders were willing to fund “problem-finding”. It has to be done sometime, somewhere, and it requires support, and often a fair amount of work and time

the researchers were not initially concerned about whether something could be immediately implemented. This opened up thinking to a much wider range of possibilities, and “final goals” could be 30 or more years off.

a lot of the thinking and explaining were done in terms of detailed “scenarios”: stories and walk-throughs of what the future could be like with the new ideas. For example, there are two such scenarios in Engelbart’s 1962 framework, one of them quite detailed and long.

if a scenario wound up being “really desirable”, then the questions about feasibility could be extended outwards in terms of decades — say 30 years. Would it be possible then?

If so, then the idea could be brought back from the future to see what part could be done 10–15 years out, and if that had substance, then one could use Moore’s Law (1965) and its precursors to see what could be done from the present to about 5–8 years out using “supercomputing”.

Much of the early 60s ARPA research was done using very large computers with single users to manifest the ideas as real-time interactive systems. The ones of these that were good intensified the desire to make them practical. For example, Ivan Sutherland’s “Sketchpad”, RAND’s “GRAIL” system, Engelbart’s “NLS”, etc.

A good modern example is Bret Victor’s “DynamicLand”, which he gestated for many years with detailed scenarios before taking a shot at putting a group together and making a subset of the ideas into a real-time working version. This is quite a wonderful example of how to do “vision-making” and “problem-finding”. Bret needed the great funders of the past (and that kind of funding doesn’t seem to exist today).

---

the original scan can be found by typing “doug engelbart augmenting human intellect pdf” into Google (e.g. Bret Victor carries it on his site)

What is creativity?

Creativity is seeing wonderful things that most people can’t see.

How did you learn to play the piano with both hands independently?

When I started playing keyboards as a kid I was exploring how chords and melodies went together, and the approach suggested to me was to learn “the three chords” as three note chords with minimal movement of the left hand, and see how these would work with many melodies.

My mother had a Baldwin “Orgasonic”, and its primer had the following shapes for “the three chords” in the left hand in the key of C:

image

This had minimal finger movements so was easy to memorize and produce.

Then there were lots of pieces in the key of C with these chord shapes that one would play with both hands. (The Baldwin primer suggested that one should “very soon” incorporate playing the root notes of the chords with ones feet.)

There are lots of things that can be learned about important parts of music and playing by expanding on this. I used it a lot when learning guitar and figuring out chords and how they work in all the keys, etc.

Some years later when I was learning piano “for real”, I asked a teacher about good ways to get get sight reading down, and he suggested “the 5 foot stack” of “everything”, but to start with Hymn books with 4 part arrangements, and to spend 15–30 minutes each day just “moving on through them”. Eventually, one can “see” how they will sound and one’s fingers will all go where they need to. (This is very much like the left hand idea above, but for both hands playing shapes, and more kinds of shapes.)

I can tune a guitar by ear without any reference notes. Does that mean I have perfect pitch?

“Perfect pitch” is actually “good pitch memory”.

We know this from (a) accounts of musicians in the past who could tell or sing pitches at will, (b) combined with the knowledge that the “standard pitch” in the past was different at different times.

For example, in the 17th century there were a variety of standard pitches for A (including the nice low one of around 392 (some organs even at 380), the “baroque pitch” of around 415, the high pitches on some organs of 467 (or higher, maybe to save tin!).

The pianos in Mozart’s time were around 430, and piano tuning in the late 19th and early 20th centuries was set to 435.

435 is a little flat from 440, but 415 is a half tone flat, and 392 is two semitones flat (the A sounds like a G today).

Pitch memory can be learned by those who don’t just take to it as children. There are various courses available. It was the fad when I went to music camp in the 50s, and I learned it. But it was quite a bit more trouble than it was worth for playing jazz in bars with old pianos that had been allowed to slip, for playing transposing instruments (like trumpet, clarinet, etc.), and for baroque instruments such as harpsichords. In all cases the extra interference wasn’t worth it, so I unlearned it.

Most trained musicians prefer the development of acute relative pitch. This is a floating version of “perfect pitch”. There is a reference pitch, and all the other pitches are identified by remembering “where you are”.

This includes “hearing the notes” as one looks at sheet music. With relative pitch one can “ ‘see’ what things will sound like” (a somewhat bizarre phrase in English), and there is no problem to move from a piano at A=440 to a harpsichord at A=415.

In some systems of musical teaching, especially for children, a relative language for pitches — called “solfeggio” — is taught. This is “do,re,mi, fa, sol,la,ti,do” etc (with additional syllables for the sharped and flatted notes).

There are two very different uses of this. Many European countries identify the syllables with notes on the piano “do” would be “C” etc.

For much music, this is much less useful than to identify “do” with the “tonic” note of the key one is in. In the relative use of solfeggio, “do” would be G in the key of G, D in the key of D etc. This allows one to “see/hear” where the notes are relative to the key one is in.

This covers a lot of music. But there is also music that is not in terms of keys, and for this more general schemes are needed.

What are your reflections on “The Mother of All Demos”?

I’ve been asked to give talks at the the 25th, the 30th, and the 40th anniversaries of the 1968 “MOAD”.

Here are the remarks I made 10 years ago at the 40th anniversary:

https://youtu.be/WN2BBNDdjgI

Recently, I’ve done two presentations for the 50th anniversary (a) on Dec 9th at the Computer History Museum via Skype, and (b) for a Japanese conference commemorating the demo on Dec 10th via Zoom (right now it is available via YouTube — my talk starts at 28:22, is about 44 minutes long, and there is an interesting Q & A session afterwards).

Bret Victor wrote a very good comment when Doug Englebart passed away a few years ago. It contains the advice I always give to people about Doug:

(a) don’t get distracted by what seem to be similarities around today to what was in the demo (because the intent of Engelbart and his people was quite different and much more lofty than the commercial dumbed-down technologies that are prevalent today). This means that the meanings of the terms as used today are not the meanings for these terms 50 years ago.

(b) to understand what Doug was out after, read the 1962 proposal that lays out a conceptual framework for “Augmenting Human Intellect”.

Musicians, how do you play your instruments for hours while staying entertained?

To a musician, music is life itself. “Entertainment” doesn’t come close to the “merging engagement” one has.

---

On reflection, the answer above didn’t take into account the range of “how life feels to different people”. So, trying again:

To a musician, playing music has much of the joy of falling in love and merging with one’s beloved.

Out of the inventions to come out of Xerox PARC, which were the most grossly caricatured by others and which were the most thoughtfully done?

Does “caricature” imply intent, or should we consider it just an observation of the result of sometimes blind processes?

It’s hard to caricature the laser printer, and in fact the rush to make lower cost low printing volume alternatives — e.g. Ink Jet — resulted in pretty high quality devices for the most part.

The progression of the Ethernet has been interesting, because current versions of networking have much more switching than the original Ethernet at Parc. I think most of these additions/changes have been various kinds of engineering decisions where — in small enough domains — various additional control does not get in the way of the “eventually perfect transmission” design aims.

The main parts of the programming style that is called “object-oriented” today has much too much of “simulating data-structure and procedure” programming. I would call it — at best — “abstract data types” and not “in the spirit of ‘real OOP’ ”. A ‘real OOP’ progression from Parc would very likely have been in the spirit of Linda, and with many of the elements that we find in Erlang, and moving towards a more declarative and knowledge-based system.

But this would also retain the Parc ideas about “simulation” (fits in with “knowledge-based”) and to use a full version of McCarthy’s “fluents” to “also simulate ‘time’ ” to allow both state and “transformations and changes that can be reasoned about” to be present (I have written about these ideas elsewhere in Quora).

An equally large departure from the Parc style of ‘real OOP eternally running’ is today’s reversion to (or retention) of “simulated IBM cards” for programs fed to compilers and loaders with from scratch systems builds (this is painful to see today — or any day — it misses so much).

But, for a short answer like this, I think I would pick the intent of the GUI — and of ARPA/Parc’s intentions for personal computing itself — as the most misunderstood Parc inventions. I wrote an essay in The Art Of Human Computer Interface Design (in 1989) that gave a few of the ideas behind that design.

Basically, the idea was that a UI is not just about “control panels for nuclear reactors”, but is “an environment for learning what can be done and how to do it”. This means it has to not just be explorable, but also experimented with, and this means that you have to provide really good UNDO facilities. It also means that the UI needs to manifest “things to learn and how to learn them”.

It is critical to be able to integrate anything with anything when that will help the end-users. The Parc GUI was very different here: it had unlimited “desktops” that could persist over time, each of which could manifest and combine any and all of the useful objects in this system. There were no “apps” per se.

The viewing mechanisms were general via “windows” having boundaries or not, and used as much for multiple perspectives as for simple viewing. There were no separate provisions for “pictures” etc.

Several other important considerations were to be able to “pop the hood” of “power” wherever found, to understand it, and to be able to safely add and modify it.

What could be done at Parc ca 1978 can be seen in this presentation using it that I did for Ted Nelson a few years ago

https://youtu.be/AnrlSqtpOkw?t=135

There was never put together a great combination of all of the good ideas from Parc and Apple through the 80s, but Apple added quite a bit in many ways, while missing some of the good stuff that had already been done. (A Hypercard on a Smalltalk framework would have been a really great advance.)

Today, what can I say? A smartphone won’t teach you gestures that could make you much more powerful and efficient. Smartphones don’t generally UNDO (you go back out to the top and start over). And so on for both personal computers and phones for many more dreary paragraphs. Most apps couldn’t be more stovepiped and further from integrating with others … and so forth.

A simple way to look at it is that marketing people don’t like learning curves, so they put “training wheels on the bike” and don’t tell anyone they are on. So people optimize a “non-bike” in some of the most ridiculous (and worse: detrimental) ways. This results in something like this:

image

but in the computer case, perhaps with square wheels or flat tires. (I would call this an “extreme caricature of ‘bicycle’ “!) Putting it on tracks where someone else has chosen the directions would be even more apt.

The best way to learn “real bike” is with “balance bikes” such as:

image

The important thing is to learn to balance and to turn into possible falls. Many “not quites” can safely be made because the feet are near the ground and can catch the fall. The less important part of bike riding — the pedals — are gracefully added later.

You can understand something about humanity, marketing, persistence of bad ideas, resistance to good ideas, problems with teaching and learning, and how all of these relate to public computing by looking at how bikes and bike riding are still conducted today.

So, the first question to always ask when doing UI design is “what is the ‘bike’ ?“ (this is what you don’t want to lose, and is the eventual aim). Then you can ask “what is the ‘balance bike’ ?” (this is a real abstraction of the real thing that is a good place to start). If you start with “what can people learn?”, you might come up with a correct but terribly debilitating solution (as have so many attempts over the last several decades).

I don’t think coming up with “Guitar Hero-like” non-guitar UIs that claim to allow you to touch “guitar” is good for anyone, and in the large they are disastrous for humanity. A little fantasy is a good thing, but human destiny requires us to get better at constructing maps for “realities” that are better matches to our actual situations (and inventing and using user-interfaces that help us learn how to do so).

Is the night shift on iPhones good for your eyes?

Note: I only answered the original question. Some other agency changed the question without my permission, and my answer below doesn’t apply directly.

At Xerox Parc in the early 70s, the Alto’s video display could be used for either, so we did some tests. Those tests indicated less eye strain with predominantly white background under normal ambient conditions, because your eyes would retain roughly the same level of light-dark adaptation going from looking around the room, at paper, etc., and looking at the display.

A dark close up display will induce dark adaptation while looking at it and light adaptation when looking elsewhere. The Engelbart researchers had earlier come to the same conclusion for NLS and went for black text on white screen (and pretty likely for the same reasons — they also thought of their work-station-terminals as being part of larger daily life).

image

Bill English and an NLS terminal ca 1968

So the general answer to this question seems to be “No”,

It is possible that in a very dark room that a mostly dark background with lighter text on a display might be easier on the eyes (but this would imply a kind of troglodytish existence that is likely not healthy for human beings …)

Why does the American educational system seem to lack confidence in teaching higher-level mathematics in primary schools, where in many other countries, math is embraced instead of avoided?

“higher-level mathematics”? “in many other countries ‘math’ is embraced”?

I’m not sure what the italicized terms are supposed to mean here, especially with regard to children. Let’s go for the non-pop culture meaning for the first, and with Seymour Papert’s and Jerome Bruner’s intentions for the second.

The pop-culture tends to use the terms for “turning the crank on calculations involving numbers — usually just arithmetic — and occasionally for getting stuck at algebra”.

Bertrand Russell said that maths are “p implies q”; John von Neumann defined maths as “relationships about relationships”. We can use the idea that maths are ways to represent ideas and transformations so some coherent reasonings can be done in terms of the representations alone in ways that are completely understandable.

Most school systems in the world do not teach children “real math” as practitioners of mathematics would recognize it. For one thing, the understanding part is almost always left out, and replaced by memorization of patterns to be recognized and rules to be followed. Different school systems carry this out with various degrees of effectiveness.

A bug that has shown up a number of times when needed reforms are demanded — especially in science and math — is to try to teach children watered down versions of the adult subjects. A particularly egregious example was the “New Math” (starting with set theory) that came out of the educational reform movements in the 60s after Sputnik. Part of the confusion was the conflation between understanding and usage (mathematician Marvin Minsky quipped that the problem with New Math was that you had to understand it every time you did it!) In Daniel Kahneman’s terms, mastery of a subject involves being able to get fluent in both “Fast thinking” and “Slow thinking”. We can see this problematic conflation in parts of the “Core Curriculum” approach to “math” (especially arithmetic).

In any case, the key question is how to get children going on the “real things”. The kinds of thinking that children can and can’t do at different developmental levels will often mean that “what adults use” is not a good choice for children’s minds. Watering down loses “real”. The remedy is to invent “children’s versions of ‘real’ “. This is often not easy, but the process can be great fun to do (and a way for adults to re-engage more deeply with some of their favorite subjects).

For curricula, both the slow-thinking “understanding the subject” and the fast-thinking “being fluent in the subject” skills need to be devised.

Seymour Papert was an excellent mathematician who had also spent 5 years with Jean Piaget, and wound up with a very good nose for the kinds of things children could actually do at different developmental levels. His ideas revolved around inventing kinds of “real math” that children could do, and that some really profound and fun versions could be created (e.g. forms of calculus involving the differential geometry of vectors, reasonings in terms of sequences of actions, etc.) if the interactive computer was included in the “tools for thinking”.

The power of this approach is still not at all understood, mostly because hardly anyone in the math teaching loop in the US understands “real math” — and those few who do, have difficulty understanding the minds of children.

Similarly, Jerome Bruner challenged US education with “Any subject can be taught to anyone in an intellectually honest fashion if you heed their level of development”; another way to say what Papert advocated. Bruner backed this up by heading a large amazing curriculum design for “real science” for 5th graders (“Man, A Course Of Study”: real anthropology for 11 and 12 year olds).

The basic idea is that “children’s versions of the subjects that retain the hearts of the matters” need to be invented for most of the subjects that are very different from commonsense usage.

In the more recent “coding for all” efforts that are aimed at schools, many of the things that should have gone into the goals and standards — known from 40 years of past experience — were omitted because the committees doubted that teachers could understand them. In education for the benefit of children, putting adults first is a high moral crime!

These efforts by a pop-culture involved with fads, desire for vocational certification for future jobs, and comfortable with (and even desiring) the substitution of labels for content, have pretty much killed every tinge of “actual education” in most areas, especially the ones that involve technique.

A key idea that escapes the pop-culture (partly from blindness and partly via denial) is that many of the recent inventions that have changed our world require a qualitative shift in how the the world, and the knowledge about the world, is perceived (the $5 dollar phrase is “epistemological stance”). Humans have been on the planet for several hundreds of thousands of years, so even the invention of agriculture is relatively recent (about 3% ago). Writing: about 1.5% ago. The modern idea of math dates to about 0.75% ago. Modern science: about 0.15% ago. Modern engineering is even more recent.

We humans traditionally learn our epistemological stance in our world early in life from the surrounding culture and its impact on our ad hoc reasoning abilities via our genetics. There are various levels of commitments to these stances around the age of 7 that get more difficult to change for most humans as they get older.

When we talk about “humanity” the mass noun implies a distribution of variance, so the early commitments will be firmer or looser at the individual level — but it’s a reasonable generalization to say that for “most” children “much” of their world view, values, beliefs, etc. are formed early in life.

This means that for most children the most important education for a “modern epistemology” has to happen in the early years. And this means that the most important human influencers — the parents and teachers — need to embody the new epistemology.

But generally they don’t. And, since adults tend to try to recapitulate children in their own image, we get generation after generation of “non-modern” parents and teachers. (And voters.)

In theory, teachers should embody the most important epistemological ideas in a culture. This would be a positive route to having the schools be a stronger influence for new ways of thinking than the home — and in a few generations to produce more sophisticated parents (and easier starts for teacher training). If a generation is about 25 years, then about 9–10 generations have already gone by in the US since (especially) Jefferson advocated this as being a prime goal for public schooling (some progress was made but for the country as a whole, not enough).

But this was the case for the teachers that Montessori trained, and it was one of her expressed aims that her schools embody the new ways of thinking as a culture (“You can’t teach the 20th century in a classroom!”).

This is not the case for US elementary school teachers (not even close). In the US, it is a commonplace notion that any adult knows enough to advise a 5 year old (this not remotely close to being true). In fact, in our world, the advisors for young children need to be the most sophisticated adults.

A society that understands this can readily afford to pay for it, and get it. Ours doesn’t, doesn’t, and doesn’t.

When you were learning assembly language, what assembly language did you study?

My first real machine for programming (and learning assembly coding) was the IBM 1401 in the US Air Force in 1962.

(After subsequently learning a few more machines, I realized that there were a wide variety of approaches to basic architecture. This became much more standard 15 or 20 years later, but I think programmers today would find the first three machines I learned to be surprisingly different from anything in use today.)

In the early 60s most programming on most machines was done in assembler for three main reasons (a) they were slow and small and pretty much everything had to be optimized for speed and space (for example the 1401 we had was just 8K 6-bit bytes plus tape drives), (b) the few “higher level” languages around (like FORTRAN) that could do a good job of optimization took a lot of machine time to compile (c) most programmers thought of HLLs as “for sissies” (sound familiar?).

Still, IBM was forcing HLLs, and this meant that it forced its technical people to implement both FORTRAN and COBOL on every machine it sold, no matter how small (being able to run compatibly across the many machine types was one of the big selling points for HLLs in those days (they weren’t very “high level”)).

The assembly language used by the 1401 was Autocoder — a macro assembler — that I realized later was unusual in its powers when compared to most others. The big added feature in Autocoder was the ability to “tailor” macros. This was done by putting on each line of code little expressions that would be run at assembly time — most of them testing a macro parameter — that would decide to include the line of code or not. You could think of a 1401 macro as a “heuristic” piece of code that would expand into the minimal instructions needed to do the task given the parameters supplied. (This could get quite intricate.)

Subsequent machines I learned — that required assembly language programming — were the Burroughs 220, the Control Data 3600, and the Control Data 6600 (the 3600 being the most standard architecture). Then the DEC series of machines, etc. etc. and finally the Xerox Parc Alto, where we programmed its microcode in assembler.

The big exception was the Burroughs B5000 (ca 1962) that had a HLL architecture in the hardware, had an HLL systems language (an ALGOL extended for systems programming), and was never programmed in assembly as far as I know. I learned this machine in the Air Force but never got a chance to actually use it (Burroughs was late in delivering, and I had moved on by the time they did deliver.) However, it became the basis for many “byte-code interpreters” for HLLs over many subsequent decades.

The 6600 supercomputer in 1965 was perhaps the first RISC architecture, and whose inner CPU ran 10 times faster than the main memory (100ns vs 1 us) with parallel arithmetic, fetching and storing, and a small kind of cache for sorting out sequences. It had a very rudimentary assembler, and the programmers of this machine had to do a lot of work to keep the main memory cycling and the tiny cache not exceeded. It took quite a while for the FORTRAN on this very fast machine to create code that would even run faster over all than on the 3600. (The 6600 later became a favorite target for computer scientists trying to come up with general optimizers for arbitrary machine architectures.)

What is the "proper" technique to play a classical guitar without fingernails (if there is one)?

Some pointers. First take a look at lute technique over many centuries, and you will find that most lute playing of all styles and eras was done without nails — the main exceptions were a few players, especially those who mainly played “continuo” (background chords and figurations in groups), where more volume helped and less subtlety was required. There is quite a bit known — much of it written down — about how to get many different kinds of tone and volume on a lute using fingertips.

Second, take a look at the early phases of guitar, especially the 6 string guitar starting around 1800, and you will find that most players played with their finger tips and not with their nails. For example, both of the two main virtuosos in the early 1800s — Sor and Guiliani — played with fingertips. The great player and composer of the late 19th century — Tarrega — started with fingertips, went to nails (partly for concert projection reasons) and then retooled back to fingertips for the rest of his career.

Segovia was very opinionated, had a large influence, and castigated all players who didn’t use nails. I think this set the current nails only idea today (and also was a big influence in the rather heavy, big, and loud instruments that are common today).

I had to quit classical guitar about 5 years ago because of a ruined left thumb joint. A few months ago — accidentally — I tried a lute and found that the bracing pressure on the thumb was much less (and partly because of the shorter scale). This got me going again on shorter scale instruments, both lutes, and “romantic era” guitars like they had 200 years ago. I found quite a bit to learn about producing tone with fingertips from both lute players and the subset of classical guitarists who use fingertips. I am just a few months into this process, and there is much more to learn than I currently know.

But: wonderful tones of wide varieties can be produced with fingertips, and quite a wide range of volumes. There is some overlap in the sounds that can be produced, and there are some effects that only can be made with one technique or the other. YouTube has some good examples. For example, take a look at the guitar videos of Rob McKillop. Besides some beautiful renditions of pieces, he also has tutorials about how to create great tone with fingertips.

I should mention here that I got very uninspiring tones when I first started. Rob McKillop — who is a long time lute player as well as guitar player — says that it took him about 3 years to really develop his technique.

Some of the lute technique does apply — especially having the right hand so the fingers are more along the length of the strings rather than the right angle that is normal for nails playing. A 45 degree angle is a good place to start. Having the guitar a bit higher helps get a better angle with the right hand and arm.

Here’s a very good lute player who is also an excellent early 19th century guitarist — Anna Kowalska — playing a waltz by Augustin Barrios using fingertips. She is playing on a copy of a ca. 1820 French guitar. You can see that her right arm and hand position is more lute like than standard classical guitar.

https://www.youtube.com/watch?v=rX_Hm71qnCk

As with the nails technique, one “pulls and slides off” rather than “plucks and pops”. The pulling and sliding with fingertips is done at more of an angle, and the pull is is a bit more upwards. I found that about two weeks to a month starts yielding some very good sounds, especially at lower volumes. This is a good place to start. I have been playing classical guitar “grilled cheese sandwiches” — the small Tarrega and Bach pieces, Barrios’ “Julia Florida”, “Les Baricades Mysterieuses”, etc. — i.e. the easier pieces one could play in a concert, but without excruciating technique required. Memorize these, and you can put all of your attention into listening to the sound, and trying things day after day. Eventually, the hands home in on producing much better sounds.

Getting pearly sounds at higher volumes is possible, and I’m still learning how. As with the lute, the “good” fingers for tone are the thumb and middle; the index is better for the weaker beats. The ring finger actually works quite well for top notes of 4 note chords (especially if it has gotten strong from previous playing).

We should not be surprised that different people have differently shaped fingers and pads, so part of the process is to discover what works for you (this can be tricky because nothing works very well in the beginning). One tip I got from a video of Rob McKillop is to try different nail lengths, even if just playing with the pads. This is because, for some finger shapes, the nail will support the pad for better (and worse) effects. I found that “not completely clipped off” makes for a better sound for me. This requires a little work, but not in the same ballpark as the careful contouring needed for playing with the nails.

A big discovery for me when starting to learn the lute was the increased deep sense of intimacy with fingertips — this even though when you play with nails you do start by contacting the string with both nail and fingertip. With just fingertips, you have the sensation of touching the string directly with both hands, and there is just music to be brought out in between.

The other part of this is that both real gut and synthetic gut strings at the lower tensions and shorter scale lengths have special qualities for fingertip playing. Part of this is the nature of the material itself and the kind of tone produced, and part is that the texture of these strings is not super smooth like nylon, but is a little “toothy”. This allows a lot more control with the right hand of just how you are pulling the string and releasing it. And the toothiness makes a big difference with the left hand for getting much more “grab” and control of more kinds of vibratos than are easy to get with the smoother strings.

To me, the main reason to deal with the much larger technical demands of fretted instruments compared to keyboards is that you can get much more expressive sound and ranges of sound. Fingertip playing really delivers!

Would you rather have three $1000 guitars with various characteristics (Strat/Tele/LP/semi-hollow, etc.) or one $3000 top-of-the-range guitar?

The question is sort of aimed at the electric (even rock) guitar world, but it might be useful for a perspective from the classical guitar side of things. An additional bit of context is that I have a “nice number” of very good electric guitars of various kinds.

The first thing about high end classical guitars is that they also have “various characteristics”, especially along the dimensions of tone and action, and to a lesser but useful extent in size, and even shape. Most serious classical players have more than one really good instrument for similar reasons to electric players, because certain kinds of music can fit certain instruments better than others.

It’s not that you can’t get a very good classical guitar from a luthier for under $5000 (you can), but here, let’s start there and look towards to (say) $12K or so. I’m guessing that most classical players with this budget would go for one really great instrument, and for the very same reasons that violinists, pianists, etc. would.

In this range, the price is not automatically a measure of the goodness of the instrument. This is because — as in the electric world — as in the violin world also, etc. — there are fads, famous makers, etc. The good news is that there are some really cosmic classical guitars in this price range, and some of them are closer to $5K than $12K.

I want to echo and enlarge a point made by one of the other answerers: a really great piano (and not a concert grand either) these days is more than $100K (e.g. a Boesendorfer 200 and above). You can get a really great guitar for less than 1/10th of a great piano (and you can take it with you everywhere). This is one of the great musical bargains of all time.

In the early days of the computer mouse, what were the arguments for and against making it one-button versus having multiple buttons?

The mouse was invented by Doug Engelbart and Bill English in 1964 for their proposed “oNLine System” (NLS). There were a number of other pointing devices around at that time, including the light-pen/”gun”, and several kinds of track balls. Invented the same year was the RAND tablet-and-stylus (it was really a great piece of engineering and worked wonderfully well, albeit for a very high cost).

The vanilla NLS input devices around the time of the “Mother Of All Demos” (50th anniversary on Dec 9th this year), were a typewriter like keyboard, a “chord keyboard” for the non-dominant hand, and a mouse with three buttons for the dominant hand.

image

The idea of this scheme was that “all things should be doable” while navigating the hyperspace — “both hands out” — or typing large amounts of text “both hands in”. This way one did not have to switch back and forth between pointing and inputting. In practice, the middle typewriter keyboard was just used for bulk text of paragraph size and larger.

All text and all commands could be issued with “hands out”. The five “chord keys” on the left plus the first two mouse buttons provided 7 bits of input (the system remembered the down keys — so you didn’t have to be perfect — and would send the whole byte when all keys came up). The right most button on the mouse (the user’s index finger is over it) is “command accept”.

The command scheme was a bit more verbose than it needed to be (and there were a few reasons for this (another answer)). It was basically procedural with command verb first.

give command

give kind of entity

give other parameters

command accept

There was a bit of a learning curve with this scheme, but you were rewarded by most executions responding in sub-seconds, so you could usually issue a few commands per second (worth comparing to the snail’s pace of UIs today, except for scrolling, etc.).

One didn’t learn “the bits” but started with learning basic editing and navigation commands: Move Character <this one> <there> <accept>, etc. This would get you to learn “m” and “c”.

A few hours would allow you to do most things, and about 10 hours were enough for comfort. Adepts could type 20–30 words per minute with “hands out”.

In practice, this ran afoul of “value uncertainty” for most outside users (i.e. they said to themselves “should I put in the effort to learn this?”, and most wound up saying “no”). This is basically the “violin problem”: it’s incredible once you learn it, and very tough and unrewarding for quite a while.

Even though NLS was a lot easier to learn than a violin (10 hours is really not a lot of extra time to invest), the “value uncertainty” was a big drawback in acceptance. When people complained, Doug would say “Well, people are going to be using stuff like this 6 hours a day — they should learn an efficient interface”. Almost no one believed him.

The later Parc Altos had both mice and the five finger chord keyboards. One of the motivations for designing a much simpler easier to learn GUI interface (originating with Smalltalk) was that we were working with children, and we had big goals for how they should allocate their mental resources between a UI and the content. (This is the scheme that Steve Jobs saw that was adapted for the Mac.)

The object oriented UI of Smalltalk had selection of object first, then <command> (which also served as “command accept”). This was both modeless and more compact, and proved to be more intuitive for the children.

A lot of experimentation was done with the three button Parc mouse. The Dynabook design required tablet/stylus (in part so the kids could draw things) but these were still really expensive, and the Parc “living lab” required things to be replicated for 100 users (for a personal computer, meaning: we needed to build 100 Altos*). So we looked at the mouse. The NLS mouse had real problems (used potentiometers, was noisy, etc.) so Chuck Thacker (I think) came up with a relatively small, very high resolution mouse using shaft encoders. This was really good, with the only drawback that it needed to be cleaned periodically. Eventually it was replaced with several varieties of “ball mice” and also Dick Lyon’s first optical mouse.

There were numerous UI designs that used:

one button of the three button mouse (imitating a stylus/tablet). This was usually in conjunction with several different “gesture recognizers” like those in the GRAIL System at RAND

used two buttons (this was taken up by the later Star office system product). Basically pointing-and-clicking and “properties”. (Sound familiar?)

all three buttons (the latter was used by most of the technical people: one for pointing and clicking, and the other two to bring up menus).

Mixtures of all three of these are still in use today.

Note that a good gesture recognizer has some of the traits of NLS. Basically, the gestures allow one to be very fluent and fluid with the UI, at the cost of learning them. (Today is terrible, where the UIs use simple gestures and don’t teach comprehensive ones — this is kind of for 2 year olds and 92 year olds, and no one in between!) Experience with both NLS and GRAIL convinced us that a system with a really great gesture and writing recognizer still needed a keyboard (hence the inclusion of a keyboard along with the stylus and touch-sensitive display in the design of the Dynabook).

It’s worth contemplating the tradeoffs between having multiple buttons on the mouse vs “command keys” vs. having something like the NLS chord keyboard for more possibilities.

Where we are today is that most people use their systems much more than the 6 hours a day that Doug predicted, but they don’t want to do any concerted learning. This gives the worst of both worlds, in that people put in amazing amounts of time avoiding the smaller amounts of time needed to learn a real skill (this is “Humans 101”!) This is basically a bicycle with training wheels where no one realizes there are training wheels — so almost no one ever learns to ride a bike!

And most of the UIs today don’t provide any learning curves for the smaller percentage who can see that learning real skills would make a big difference.

What guitar has the best tone?

Let’s leave “best” as subjective, and look at ranges of tone as played by you (or by some very accomplished player).

I want to discuss classical guitars, but first a Telecaster story. Some years ago, more or less by accident I picked up a Fender Telecaster 50th anniversary instrument (ash) and loved both the action and the sound, so I bought it. Teles have a huge aftermarket so I bought another 50th anniversary instrument to use to play around with gear substitutions. To my great surprise the second one I bought was not in the same class of feel and sound, even after some work (it was still “quite good”).

Now for classical instruments. Guitar Salon International in Santa Monica has an enormous inventory of high end classical guitars, including many important historical instruments, and all in great playing condition. They have a nice big room with a high ceiling and great acoustics, and they will let you try out guitars all by yourself for several hours at a time via an appointment. I have bought several very good instruments here over the years.

Each player has an internal image of “kind of how” a piece should sound, and they will aim for this sound. So a particular player will sound like “themselves” from instrument to instrument (this is amazingly true even on a good piano).

But once you adjust to this, the differences between instruments, and to the point here, between really good instruments is still quite large (similarly with pianos). With guitars, e.g. a Ramirez really sounds different than a Hauser. A both very different from (say) a Michael Thames version of a Torres “La Leona” with a “tornavoz” a kind of Helmholtz resonator made of brass that Torres installed in many of his instruments in the 19th century to reshape (marvelously) both the bass and the treble response.

Some guitars want to sing (and have a lot of sympathetic ringing between strings) and others are more interior in sound (and need to be coaxed to sing). Some are bright, some are dark. And so forth.

(With pianos, the differences between a Steinway, a Bosendorfer, and a Fazioli are palpable.)

Classical guitars differ in the soundboard woods (spruce and cedar give very different responses) and there are now many kinds of “double tops” which have paper thin woods as the bread (maybe spruce for one bread and cedar for the other, with something in between, such as “nomex” or balsa wood). The latter tend to project a lot more but have a different tone.

They differ in body woods (though Torres once made a great guitar with a papier-mache body to show that it was the soundboard that was doing the work). The bracing on the soundboard is another big alchemy.

There is “tuning” of the sound board via graduated thickness.

Finally, there is a very large range of kinds of strings that are available, from real gut (some of which is actually made carefully enough to be like the gut strings of old), synthetic gut of many kinds, nylon, “carbon”, and more and more (even Japanese fishing line has been used by one the great guitar players of our time!).

And so forth. Many of these choices actually do make a difference with the sound that a good guitarist can coax forth. Most players use fingernails on their right hand to pluck the strings, but there are whole schools of thought about this. And some players prefer to play with fingertips (as many players over the last hundreds of years have used on the guitar and its ancestors, and also for the lute). The choices here will in turn be aided by the types of strings preferred, and these also affect the “raw material” sounds of the instruments.

In my view the differences are quite large and important, especially if one is very sensitive to sound.

What’s a good introductory instrument to learn for someone who knows nothing about music?

Let me suggest “the voice”. This is because “musicality” has to be developed internally for any kind of musical production to be “musical”. If you take lessons on another instrument, at some point the teacher will try to get you to sing what you are trying to play (and conversely, good players are always “singing inside” and use this to help expression on what they are doing externally).

The need to take breaths and the connections with speech and prosody automatically provide the start of phrasing, and both ways to do it, and to ask good questions about it (when can I take a breath without breaking up the line?)

It’s also the case that most people can say and sing really complicated rhythms long before they can make them happen through an instrument.

If you are playing an instrument that has flexible pitch — strings, winds, brass — then you need to hear what the pitch should be so you can make it.

If you are playing an instrument that makes the pitch for you — keyboards, guitar, etc. — then you still need to be able to hear the sounds you should be making.

If you are reading music, you need to learn to hear what “the dots” indicate (and with phrasing — as in reading in natural language).

If you are playing by ear — you need to develop an inner ear.

And so forth for many more reasons, perhaps the two most important of which are (a) “the music is not in the piano” — so it needs to develop in you, and (b) learning to sing will help you pay much more attention to what is actually going on in music (it’s a lot more — and a lot better — than most people think).

One final thought. One of the most noticeable things in teaching young children music is how musical most are via singing and dancing, and how introducing even a very simple pentatonic Kodaly instrument will cramp their musicality, sometimes to the point of discouraging for perhaps forever.

So a really interesting question should be: what are the best ways to help people learn to sing both on the outside and especially on the inside?

Recently I’ve been considering trading the first “real” guitar I ever bought myself for a stratocaster, I’ve loved the sound of strats for so long but I’m scared to give away my first guitar (its a telecaster) Should I trade it?

I’d would urge you to keep it (and make a plan for how to get a Strat). This is because “bonding” with an instrument is more than learning to use a tool — it becomes an extension of your psyche, imagination, even soul. You don’t want to amputate anything that does this for you. But you can always bring in another that extends you in other dimensions and bond with it also.

What guitar surprised you the most (positively or negatively) when you first played it?

Originally written ca. 2018

I’ve played lots of different guitars over the last 65 years, mostly electric — especially jazz guitars — and somewhat intermittently — with some long stretches professionally, gaps for grad school and research (I’m the kind of player who has to practice a lot so gave up playing for a while) and then classical keyboards for years. Over the last 10 years it’s been classical guitars with a recent 4 year break because of a ruined left thumb (age, arthritis, etc).

Via trying a lute a few months ago, I’ve been playing again, now on shorter scale smaller 19th century style instruments with lower tension gut-style strings, fingertips instead of nails, and my thumb has just barely been tolerating this. But it’s wonderful to be able to play at all.

This got me to start looking for guitar makers who liked to be influenced by the 19th century — YouTube is a great resource. I found Michael Thames, who has been making a wide range of instruments for more than 40 years, many of which are 19th century influenced or “copies”, especially of instruments by Torres, who pretty much set the direction for classical guitar design after 1850.

Many of these sounded fantastic on YouTube — but it’s always difficult to tell, there is usually added reverb and equalization, etc. And a great player can really make a difference. Here’s Matt Palmer playing a Michael Thames instrument that is a “copy” of a Torres “La Leona” guitar (using a “tornavoz”, a kind of internal Helmholtz resonator that Torres pioneered to lower the resonant frequencies of a guitar and to make the trebles “sing” — it is a brass tube around the sound hole and you can see it in the opening seconds of the video).

https://www.youtube.com/watch?v=9VVcJgE6rgw

Just on the strength of these, I contacted Michael Thames (a very fun experience) and he was happy to start making me several shorter scale versions of some of these instruments.

A month ago I finally got back to LA and visited the amazing “Guitar Salon International” in Santa Monica which has an enormous inventory of top quality instruments (and where I’d bought several of my classical guitars, including a Ramirez 1A, and a 1950 Rodriguez).

They had a Michael Thames “La Leona”, but in a standard scale (650mm) and a neck size that is too big for me now. I tried it anyway.

This was a completely surprising shock. It sounded shockingly better than any of the recordings on YouTube (and “kind of cosmic” compared to very good classical guitars). And it played better by far than any classical guitar I’d tried up to then. The action was nicely low, but it was the “seatability” of getting a string down to a fret that was so much better than on other instruments.

They let me borrow it for a few days, and that was a few days of heaven. The tornavoz — as invented by Torres and reinvented by Thames — is what every classical guitar needs. It brings at the same time to the bass: depth, clarity, fullness, and “largeness”. The treble is even harder to describe, but it is definitely “singing” in many different special ways. The recordings do a better job rendering the treble than the shock of what the bass actually sounds like.

This is why I quote the word “copy” — this instrument sounds better than the original “La Leona” (that is still around today). I think Thames’ instrument is a combination of the best from the past and from what is known today (especially by him).

It was hard to give it back to GSI. The extreme playability of this instrument almost disappeared my current thumb and strength problems even on the more difficult stretches. The sound was indescribably wonderful.

I’m looking forward to the new guitars by Michael Thames — the first in a few weeks.

Still, I have “that feeling” about the one I got to play for a few days.

---

I wrote the above a few years ago. Recently, Jeff Dwyer posted an interesting comment that I’m putting here with my reply:

Jeff Dwyer: I sympathize with you on the thumbs. I’ve got about 6 decades of playing behind me too, some professionally, and a lot of physically demanding sports and jobs. My thumbs are both trash. I get cortisone injections in both CMC joints, but they’re not working so well anymore. The MCP joints are also gone. Looks like surgeries in the future. Anyway, I feel your pain. I can’t play for very long these days, and I can’t reach the chords I used to without a lot of grimacing. But I’m not giving up, and I hope you don’t either.

Hi Jeff: I wrote the answer a few years ago — this is being written in June 2021. Cutting to the chase, the two guitars from Michael Thames were terrific, especially a “Miracourt copy” with two extra bass strings and a tornavoz (scale: 620mm). This is just a really great guitar in all ways.

image

I was able to play this for about 18 months until my left thumb couldn’t continue.

Meanwhile, I decided to really learn the lute (it turns out to be quite different from the guitar in most ways, including how “specific” it is about pretty much everything on both hands — more like the violin in that respect). There is much less transfer from guitar than I thought. I also started to explore the lute instrument world, and after quite a few instruments, completely lucked out when a Paul Thomson 7 course Renaissance lute (1997) showed up (he is the “Strad” of lute making, and the one I wound up with can only be described as “luminous”).

image

It is wonderful to be able to learn and play this at age 81! The lute has an immense repertoire over many centuries. I think it has many more “intermediate level” pieces of first class musical quality and beauty than any other instrument. It doesn’t produce a lot of sound, but it has more “presence” than instruments like the guitar, piano, harpsichord, etc. It does what a low level brass choir would do (where two notes around middle C will fill a room). Here’s a sample of a “tone piece” from ca 1520s (so 500 years ago) that I did as a birthday surprise for my wife last year (good headphones advised).

https://youtu.be/y4YrXQLyFYY

Why are there frets on a guitar? Wouldn't a guitar player be able to play from hearing? They seem more annoying than useful to me.

I would like to drill deeper into the issues than the other answers.

Issue 1 - playing in tune at all

There’s the question of “what does it mean to play in tune on any instrument — especially where harmony is involved?” On instruments that are capable of microtuning on the fly — like the violin family, the woodwinds, the brass, and the human voice, etc. — ensembles of players with good ears will tend to play or sing notes in chords so that the reference pitch of each note agrees with the harmonic pitch of the fundamental note of the chord. Let’s call that chord “C” major, and pretend that the reference pitch of “C “ is 100Hz. Its harmonic partials will then be 200, 300, 400, 500, etc. In the C major chord of C, E, G, we could choose the pitches for the E and G notes from the pitches of the harmonics in the root note C. So the reference pitch of the E would be an octave of the 5th harmonic 500 (so 125, 250, 500, 1000, etc.) and the G would be an octave of the 3rd harmonic 300 (so 150, 300, 600, etc.). A closed chord C, E, G, C would then be 100, 125, 150, 200.

This is so nice sounding it would be great to get all the notes in our scales from this principle. We could start with F (which in this scheme would be an octave of 33.33) and “run the 5ths” to get F, C, G, D, A, E, B, F#, etc. This will give us serviceable “white keys”. Except …

We do this by multiplying by 3s. So F = 33.33…, C = 100, G = 300, D = 900, A = 2700, E = 8100. If we start dividing the E by 2s to get its suboctaves we find that the E gotten this way is ~ 506, not 500! It is inharmonic.

But if we use the one that is harmonic with C (E = 500) it will then be flat for what the 5th should be for A (E = 506).

An even quicker way to see the larger problem is that “running the 5ths” (multiplying by 3s) cannot get back to an octave of the starting note (octaves are gotten by multiplying by 2s — and 2 and 3 are relatively prime).

Something has to give. The nice thing about the microtuning instruments is that the players can adjust on the fly by listening to the harmony. So e.g.: a cello player will play the Bb on the top string in a slightly different place if the surrounding chord is a Gm or a Bb major or a form of Eb chord — this is so the harmonics of the conglomeration will fit together as nicely (“as harmoniously”) as possible. (The players are not thinking of the names of the chords so much as listening to see if the whole is “sweetly in tune”.) A little thought will reveal some pernicious cases, but these are smoothed out in practice.

We can see that a keyboard instrument is a real problem with just 12 pitches available in an octave. The problem has been dealt with in 3 main ways:

(1) restrict the number of key centers and tune the notes for those key centers (so e.g. you might be able to play in the key of C, but not in the key of C#) — retune the instrument for different pieces if needed. The former has been used for baroque organs and harpsichords, and the latter for harpsichords, etc. These schemes of “unequal temperament” have sometimes been used in compositions to express different feelings in going from smoother to rougher chords and back.

(2) put more pitches per octave on the the keyboard (split the sharps, etc)

image

An Italian style harpsichord with 7 extra pitches per octave (i.e both F# and Gb are available, etc.) for 19 total pitches per octave instead of 12.

This is a pretty good solution, and the split sharps are usually almost as playable as a single sharp. (Some historical organs and recreations even have some of the sharps split in the pedal notes!)

(3) tune the 12 pitches per octave equally out of tune with each other to compromise (this makes every major chord out of tune, especially with 3rds that are much too wide). This is the solution most in use today — its main benefit is to allow composers to move key centers without getting “wolf” notes, especially in chords. Keyboard instruments with lots of harmonics in each note (like organs and harpsichords) produce a kind of a “grit your teeth” sound, especially if the music (typically baroque) has been written with much purer chords in mind (for example, many Buxtehude organ pieces want “sunlight” for their ending major chords, and the equal tempered version of these does not have this effect at all).

We can see that fretted instruments (with — say — 12 frets to the octave) will have the very same kinds of problems, especially too wide 3rds and too narrow 5ths if equal temperament is aimed for. Many of the early tunings for lutes were in the scheme of (1) above, and it was also with the lute that the first attempts at (3) were made.

Some of the early fretted instruments — lute, vihuela, etc. — had frets made from gut and tied on — so they were moveable. This allowed the player to nudge a fret to help an interval (at the expense of other intervals). A favorite one is the equivalent of the 10th between the low B on a guitar and the D# on the 2nd string — the fourth fret can be angled a little flatter at at the top to get the D# more in tune for pieces in E major and E minor etc (in the lute the B is a D and the D# is an F#, etc.).

I couldn’t find a picture on this pass, but some lute players would stick a little something to a string that would act as a fret when you pushed it down there behind the regular fret. This gave the player two pitches to choose from for that note … (I haven’t tried this …) I think this has also been done a few times on guitars.

On a guitar the frets are fixed, and you can bend strings sharp but not flat (so you could tune a little flat and bend to the better pitches — this makes chord tuning a real problem. Similarly, getting rid of the frets on a guitar allows microtuning but makes polyphonic playing in good enough tune very difficult (for most cases, more difficult than the net gain).

But, some of the solutions to the keyboard can also be done on the guitar. Here is a guitar by a master builder in France that allows for adjustable microtuning, and especially for perfect thirds in certain keys. The little “fretlets” slide up and done in grooves. If this is “fixing some of the guitar tuning problems”, it is also an illustration of how large are the tuning problems on standard guitars.

image

One of several solutions to “the frets tuning problem”. In this picture, the guitar is tuned to one of the (1) tunings which makes some keys really good and other keys not playable.

Issue 2 — the problem of different diameters and tensions on fretted instruments

Now we need to add the problem that the strings on a fretted instrument are not remotely similar to each other (and their partials are also not perfectly harmonic). One of the ramifications is that equal length strings of different diameters and masses will really be out of tune on any given set of frets.

One of the compensations on a classical guitar is to angle the bridge. This doesn’t work very well. More modern guitars will have both length and height adjustments individually for each string. (It’s worth noting that this only works a little (the compensations will not work equally well for every fret). On classical guitars, the G string is just too fat as an unwound string (the partials are out of tune), and it has proven very difficult to make a wound version that will work better).

Part of this issue is that the amount of pressure needed to get a string down to a fret increases as you go up the neck. This will sharp the pitch a little more and more, also adding to the out of tuneness.

So:

It is very hard to get a guitar into any kind of reasonable tuning! This will obviously bother some players very little (one hears really out of tune guitars all the time, even in YouTube demos by pros, etc). I think many players and listeners might even associate the out of tuneness with a kind of timbre.

More musical players will have found schemes to compromise the tunings on their particular instruments, and will often compensate for the main harmonies they want to sound as much in tune as possible. Another compensation is to roll off the high harmonics — via nylon strings, the use of “jazz guitar boxes” designed to emphasize the tenor range, filters, etc. Use of vibrato and bending will get around some of the tuning problems.

Still:

The thing that is so wonderful about a guitar is how much expressive music can be gotten from just six strings in such an intimate setting, especially on an instrument that embodies some of what classical style instruments provide. I think the two main solutions that would help a lot in the future are:

Somewhat adjustable frets — with better mechanics for adjustment. I think this is a whole new area for design that has not been explored much at all since the time of the lute, and addressing the very same issues there.

Much better strings for both classical and electric guitars. In the 21st century, it should be possible to make materials that have the right strengths, densities, harmonics, etc. to provide a much more uniform response when put on an instrument.

Added Nov 2023

And:

You could get a fretless guitar made, as Chet Atkins did and uses in this video with Jerry Reed and Pat Bergeron (also they get a great groove going in this one): You can hear the tradeoffs.

Jerry Reed & Chet Atkins - "Summertime" (Live)

How did the discipline of science spread after its invention?

A large question for a small space! On his landmark BBC/PBS show — “Civilization” — Kenneth Clark said: “People in the Middle Ages were as passionate about the Truth as we are, but their sense of evidence was very different”. The term “science” has meant very different things in different eras — it once mainly meant “gathering of ‘knowledge’ “ (which still requires a definition of ‘knowledge’).

There is also the very large human problem of “cognitive bias” (lots of them), that includes “confirmation bias” (the strong tendency to see “evidence” that supports a pet theory, and to “not see” or downplay evidence against the theory. One way of dealing with that is to create a community, part of whose duties are to criticize and vet (or not) theories of others.

Alfred North Whitehead pointed out that “The greatest invention of the 19th century was invention itself”. He meant that “invention” had become mainstream in a way that it most certainly wasn’t beforehand. One way to see the big changes here is to look at the exponential rise in patents. Similarly, the idea of “progress” can be considered an “invention” of the 18th century: it wasn’t that it hadn’t happened, but most people expected to die in a world identical to the one they had been born into (except who might be king, etc.).

Here we are talking about “invention”, and “progress”, and especially “science” as ideas in the larger culture.

One large part of the epistemology of modern science is “making good maps”, both actually and as an analogy for the whole pursuit. Maps in the Middle Ages depicted stories that people believed (they showed where the Garden of Eden was etc.) and they were not careful enough for navigation. When the Age of Exploration got going again (the Greeks and the Romans had been extensive explorers) really good maps were needed again, and this led to gradually being more careful, paying attention to evidence, using various kinds of reasoning and reckoning, special tools, etc. And criticism, and a community.

Practical navigators did not expect that a great map would be completely accurate or comprehensive. They also did not use the Aristotelian two-valued logic (in which “not-True” is “False”). The maps were “not-True” but accuracy counted, so there appeared “many gradations of “not-True”, some of them critically important.

Most of philosophy — and many people trying to do science — thought they would get to “Truth” if they were careful enough; if “ ‘God’ was a mathematician enough”, etc. So far this has been more of a snare than a guide.

This leads us to two large inventions of modern science. The first orbited around Newton and was kickstarted by Brahe, Kepler and Galileo. Newton’s “System of the World” results were so good, that they looked as though they might be “exact”. In the 19th century, instrumentation and observation got so good as to find tiny but real discrepancies between the wonderful mathematics and observation, even before Maxwell’s non-Newtonian models.

At some point some people realized that they needed to deal with the hermetic nature of mathematics when attempts are made to model and deduce the physical world. Math itself consists of representations of forms and operations that preserve “truth in language” in careful ways (as Bertrand Russell pointed out: “p implies q”). To use this in science one needs to be careful about the mappings between the definitions of forms and operations to the phenomena we can glean by observation. The history of science is partly that of humans gradually finding more “out there” than expected. Right now there is no reason to expect that enough has been found to allow the universe to be deduced by reasoning.

The second great invention of modern science came when science started to aim at “goodness of correspondence” between theories and observations, with special points awarded for good predictions, including ones that could find flaws in the theory. Just when is hard to pin down, but the 20th century is pretty coextensive with this way of looking at things.

Also, for those interested in the philosophy of science, it is worth looking at the ideas of C. S. Peirce, William James, etc., who towards the end of the 19th century started to wonder if “modern science” could be defined well enough on pragmatic grounds alone.

For a talk years ago I assembled and explained the excellent combinations of reasoning and observation of the Greeks during the Alexandrian ear that led to a qualitatively accurate picture of our situation in the solar system. This included dealing with similar triangles, the size and the shape of the Earth, lunar eclipses, and the attempt to make a triangle of the Earth, the moon, and the sun (which was “good enough” to get “a huge distance to the moon” and an “enormous distance to the sun and its size”). This was “real science” in every sense of the word, but it didn’t stick.

Was this because of the lack of a larger community? The lack of the printing press? The social and governance systems of the time? The disinclination of the Greeks to do mechanics except as toys? Did this correlate with slavery?

In any case, it’s worth calculating human lifetimes since the time of Kepler and Galileo — at 70 years per, we have only about 6 lifetimes since then! And about 5 since Newton’s “Principia Mathematica”. A lot has happened in those 5 lifetimes (especially compared to the 1000s of lifetimes before).

One way to look at this is that science has flourished via the combination of a few really unusual minds per century plus that a large percentage of more normal minds can learn and do what the unusual minds have turned up. This has allowed an enormous amount of work to go on in parallel without requiring a lot of serial coordination. It was exploding even before its impact on engineering and technology was appreciated. A good perspective on this is to look at the history of the physical sciences in Japan after the Perry expedition in 1853, and the Meiji Restoration in 1868 that started the moderization of Japan. Just 67 years later in 1935, Hideki Yukawa, educated in Japan, published his theory of mesons, and received the Nobel Prize for this in 1949.

One way to think about all this is that the “invention of science” really did take place in several stages, the most important of which were the inventions of perspectives on how to go about finding and getting and making “pretty good knowledge” and to share this knowledge in ways that transmit the deep foundations of what has been discovered (including new ways to discover).

How would you tune a guitar to match a piano with stretched tuning? Would you tune to the lower bass notes being played or just the middle C?

Robert Preston’s answer is good. To understand what is going on, it’s worth contemplating why would tuners stretch the octaves (to make an interval sharper than a perfect octave) on a piano?

Seems like it would put the piano out of tune …

But it already is out of tune via the equal temperament system (which stretches the 3rds and narrows the 5ths). Also the extreme tensions on most of the strings produces “partials that are not harmonic” (meaning that the strings themselves are a bit off, and the partials, especially the octaves are not in tune — this produces beats and a sour feeling. The stretched octaves are done to try to compromise between the fundamentals and the partials for octaves (especially), and for other critical intervals (like 10ths).*

Because almost all guitars have non-moveable frets and the modern instrument dates from the 19th century, which was very invested in equal temperament, the frets are similarly out of tune harmonically (the fourth fret produces a note a tad sharper than a harmonic third, the 7th fret is flatter than a harmonic 5th, etc.). Lutes have moveable frets made from string material, and one can often adjust these to get really good 3rds, 6ths, and 10ths for important notes in a given piece (this can be heavenly, and it would be great to have on a guitar).

Most guitar players who are sensitive to tuning will “tweak” intervals for a given piece (especially to try to get 3rds not too sharp). This is tricky because many notes on a guitar are available at the same theoretical pitch on four or five strings — but they may not sound exactly the same pitch. Also, via the nature of guitar strings, the short scale of the guitar, etc., makes it difficult for even fresh strings to really sound in tune (G strings are notorious for this). To all use the same frets, the bridge needs to be bumpy, and even this is not enough. Even just for tempered tuning, the frets on a guitar need to look like this:

image

You can see that both the G and the B especially need very different fret positions (see “TrueTemperament” on the net for more info). And all this just gets you good equal temperament tuning, not “perfect tuning”.

So, back to Robert Preston. If you are playing with a piano, and you are sensitive to “in tuneness”, you will already be tweaking your guitar and grumbling, so including the extra tweaking to compromise with the piano is not a big addition to the overall chore.

---

I have a “fortepiano”in the style of 1795, and merely stretching the octaves is not enough (not a uniform approach to design in those days). We found that we had to tune it via 10ths to get something that would sound nicely enough in tune (especially for early Beethoven).

Which is more difficult: playing piano or playing guitar?

The answers to this question are like a cross-section of US culture. The one I thought closest to useful was by Robert Micarelli.Roberto Micarelli's answer to Which is more difficult: playing piano or playing guitar?

Here is a different way to think about this question, and many of the conclusions can be reached even if you don’t play an instrument at all.

If we think about playing an instrument from the point of view of both the performer and the composer, we should not be surprised that both of them try to explore every part of what is musically possible. The result is that in the very real sense of what it takes to do something extreme, every classical instrument is as difficult as any other.

I have played all the classical keyboards (with a main emphasis on classical pipe organ). And I’ve played most kinds of guitar (including jazz guitar professionally, and classical guitar avidly). The large differences between keyboards and the fretted instruments don’t allow really interesting comparisons to be made. But — if a keyboard piece can be played on the guitar — the guitar can be made to sound and express almost cosmically more. This is because there are more expressive possibilities (the guitar can make many timbres for a note, it can vibrate and bend notes, etc.).

But I don’t think this is a centrally good way to think about this. Another way is to ask “how many hours of practicing” is it going to take to play such and such a guitar piece on guitar, and such and such another keyboard piece on a keyboard. If you throw in the side condition that the result should be “musical” not just “technical” then — in my experience — the pieces that are “difficult” to “really difficult” on each instrument wind up requiring commensurate practicing.

The beginning experiences on both kinds of instrument can be quite deceiving. For classical music, I think one can get going more easily on a keyboard (piano or harpsichord) than on a classical guitar. (Violin has a tough startup period also.) At some point in the future after one has gotten through a few stages of fluency, they all start to feel about equally difficult, but each in their own way. People always ask me — especially pianists — if adding in the pedals on the organ for one or even two extra lines of music isn’t harder than piano or harpsichord. My sense is “no”, just different (it is rather like playing different things with each hand on a keyboard instrument — the two toes and two heels are basically adding one more four-fingered “hand” that is made from your feet!).

A truly difficult instrument to play well is the cornetto — basically a hollow stick with 6 holes and a very tiny trumpet-like mouthpiece. It is quite difficult to even get a note out of it. There are no resonances to help nail the pitches, every note has to be lipped, etc. There are only a few really good players in the world — but a great player can create a cosmic experience with the amount of expression that is possible.

image

Having said all this about “all instruments being equally as difficult”, I also somehow feel there is always more that can be reached for on guitar in terms of “mood”, “tone”, “expression”, etc. Some of my favorite pieces played by my favorite players are rather simple at the notes level (but are capable of deep cosmic expressive gestures). This is not a comparison with the keyboards (most of these dimensions are a bit more shallow on them).

But just to give you an example, here is a classic piece from a century ago by the Paraguayan virtuoso Augustin Barrios — “Julia Florida” — a love song for one of his students. The score is pretty simple and rather slow in pace (it could be considered an “intermediate piece” in the terms of the notes). But it is made for the player to treat every finger on the right hand as a violin bow and to magically render something that is not just chamber music, but is even “orchestral”.

I like Matt Palmer’s version of this, and he is playing a truly great instrument in his YouTube recording by Michael Thames — his version of Torres’ “La Leona” guitar from 1856. Getting all of this “out to us” so well is as difficult as any more obvious “zillions of notes” virtuoso piece.

https://www.youtube.com/watch?v=9VVcJgE6rgw&start_radio=1&list=RD9VVcJgE6rgw

Why hasn’t Alan Kay published a book?

Ouch!

What are some good resources for learning about scientific computing?

My basic answer is: read a lot outside of the computer field.

It is worth trying to understand what “science” means in “Computer Science” and what “engineering” means in “Software Engineering”.

“Science” in its modern sense means trying to reconcile phenomena into models that are as explanatory and predictive as possible. There can be “Sciences of the Artificial” (see the important book by Herb Simon). One way to think of this is that if people (especially engineers) build bridges, then these present phenomena for scientists to understand by making models. The fun of this is that the science will almost always indicate new and better ways to make bridges, so friendly collegial relationships between scientists and engineers can really make progress.

An example in computing is John McCarthy thinking about computers in the late 50s, the really large range of things they can do (maybe AI?), and creating a model of computing as a language that could serve as its own metalanguage (LISP). My favorite book on this is “The Lisp 1.5 Manual” from MIT Press (written by McCarthy et al.). The first part of this book is still a classic on how to think in general, and about computing in particular.

(A later book inspired by all this is “Smalltalk: the language and its implementation” (by Adele Goldberg and Dave Robson — the “Blue Book”). Also contains a complete implementation in Smalltalk written in itself, etc.)

A still later book that I like a lot that is “real computer science” is “The Art of the Metaobject Protocol” by Kiszales, Bobrow, Rivera,). The early part of this book especially is quite illuminating.

An early thesis (1970) that is real computer science is “A Control Definition Language” by Dave Fisher (CMU).

Perhaps my favorite book about computing might seem far afield, but it is wonderful and the writing is wonderful: “Computation: Finite and Infinite Machines” by Marvin Minsky (ca 1967). Just a beautiful book.

To help with “science”, I usually recommend a variety of books: Newton’s “Principia” (the ultimate science book and founding document), “The Molecular Biology of the Cell” by Bruce Alberts, et al. There’s a book of Maxwell’s papers, etc.

You need to wind up realizing that “Computer Science” is still an aspiration, not an accomplished field.

“Engineering” means “designing and building things in principled expert ways”. The level of this is very high for the engineering fields of Civil, Mechanical, Electrical, Biological, etc. Engineering. These should be studied carefully to get the larger sense of what it means to do “engineering”.

To help with “engineering” try reading about the making of the Empire State Building, Boulder Dam, the Golden Gate Bridge, etc. I like “Now It Can Be Told” by Maj Gen Leslie Groves (the honcho on the Manhattan Project). He’s an engineer, and this history is very much not from the Los Alamos POV (which he also was in charge of) but about Oak Ridge, Hanford, etc and the amazing mobilization of 600,000 plus people and lots of money to do the engineering necessary to create the materials needed.

Then think about where “software engineering” isn’t — again, you need to wind up realizing that “software engineering” in any “engineering” sense is at best still an aspiration not a done deal.

Computing is also a kind of “media” and “intermediary”, so you need to understand what these do for us and to us. Read Marshall McLuhan, Neil Postman, Innis, Havelock, etc. Mark Miller (comment below) just reminded me that I’ve recommended “Technics and Human Development,” Vol. 1 of Lewis Mumford’s “The Myth of the Machine” series, as a great predecessor of both the media environment ideas and of an important facet of anthropology.

I don’t know of a great anthropology book (maybe someone can suggest), but the understanding of human beings is the most important thing to accomplish in your education. In a comment below, Matt Gaboury recommended “Human Universals” (I think he means the book by Donald Brown.) This book certainly should be read and understood — it is not in the same class as books about a field, like “Molecular Biology of the Cell”.

I like Ed Tufte’s books on “Envisioning Information”: read all of them.

Bertrand Russell’s books are still very good just for thinking more deeply about “this and that” (“A History of Western Philosophy” is still terrific).

Multiple points of view are the only way to fight against human desires to believe and create religions, so my favorite current history book to read is: “Destiny Disrupted” by Tamim Ansary. He grew up in Afghanistan, moved to the US at age 16, and is able to write a clear illuminating history of the world from the time of Mohammed from the point of view of this world, and without special pleading.

30 years ago, what were the first 5-10 steps in designing and creating a computer?

30 years ago was 1988, and many components for many computers (especially personal computers) were VLSI, and could to some extent be TinkerToyed together.

But what if you had a special architecture in mind that would require considerable design at the gates and bits level? To some extent you would need to trod the ground laid 20, 30, and 40 years earlier. However, the era of technology available would still affect quite a few of your early design decisions in almost every consideration: power requirements, connectivity/cables, various kinds of memory resources, etc. For example, the advent of the Intel 1103 1K bit memory chips in 1970/71 plus the available bipolar technology allowed a very small group at Xerox Parc to make a whole mainframe (which emulated a DEC PDP-10). Before this it would have been more difficult because of what magnetic core memories required, and to get microcode to run fast enough to bypass more discrete control memories).

The Mead-Conway techniques — developed at Parc and CalTech in the 70s — allowed university students to suddenly be able to design and make and fab their own CPUs etc in the MOS technologies. Many of the early RISC chip designs came out of Berkeley — via Dave Patterson — using Mead-Conway.

In parallel, some of the early steps are to try to really understand and focus on the benefits desired and expected from what will be quite a bit of work, and to not over design. For example, the Xerox Parc Alto personal computer was “lovingly parsimonious” in concept (by Chuck Thacker) and is worth studying to see how a really great designer who was also a comprehensive computer scientist went about the business of doing serious design and engineering at the same time. One of his processes was to try to put as much of the cost into as much memory as possible, partly by making an emulation architecture that could emulate most of the slower devices in software rather than having to add hardware for these. The money saved here was used to make the “inner CPU” faster, which would allow more things to be emulated, etc. He used an idea that was used on the old TX-2 super computer — many program counters, and zero latency switching between them — to interleave up to 16 microcode tasks. And so forth.

Looking at things today, one can learn a lot by looking at how memory architectures can intermingle with “CPU” processing. There are enormous (even horrendous) bottlenecks to be found everywhere.

One way to learn is to design and make some computers to try out various ideas and tradeoffs. You can buy inexpensive Field Programmable Gate Array development kits that will plug into your laptop that will allow you to even create supercomputers with dozens, even hundreds, of processing elements. The gates are all the same for everything so you can get distribute storage and processing functions everywhere (and even merge them).

Why was personal computing an unusual research choice in the 1960s?

I think Mark Miller’s answer is pretty good. One distinction that might help is that between “personal computing” and “personal computer”. The drive for interactive time-sharing of the large mainframe computers that started right around 1960 was a drive for “being able to work on your own computer whenever you wanted day or night”.

Doing this in a general way — where it seemed that you had the whole computer to yourself to do anything you wished — was preceded by multiple terminal sharing of specific applications (for example in the SAGE air defense system of the 50s where very large physical computers (football field size!) would have 30 or 40 graphics terminals with pointing devices to track planes via real-time radar data).

image

Here is an example of a SAGE console from the 50s. And below is a diagram of a SAGE “blockhouse”.

image

The dual mainframes were on the second floor which was about the size of a football field. The consoles were on the 4th floor, and there was an early version of a display projection system on the third floor. The last one of these blockhouses was decommissioned ca 1982!

John McCarthy’s take on this setup and the console was “Every home will have one of these” — because the analogy was to utilities for electricity, water, gas etc. John could envision such a utility for “information” (and that this could be an early way to make the “memex” that Vanevar Bush had described in the Atlantic and Life magazine in the mid-40s). Today, this would be called a “cloud” idea. John was one of the forces to get MIT to make such a utility for its faculty and students, and this first opened up for interaction ca 1961.

The experimental computer for SAGE was the TX-2 at Lincoln Labs, and it was much better set up hardware-wise for sharing than the commercial mainframes (MIT had to use a commercial IBM mainframe). But the whole machine was given to Ivan Sutherland from 3am to 6am in 1962 so he could invent modern interactive computer graphics (he had to use half the capacity of this supercomputer to simulate a workable display system).

The main designer of the TX-2 — Wes Clark — also created in 1962 one of the very first “personal computers” — the LINC — for use in biomedical labs (this was a really nicely appointed machine for its day). Consider that it had all of 2K 12-bit words as its main memory, all for one person to use for whatever they wanted!

image

This — along with the earlier Bendix G-15 and the early minis such as the PDP-1 (also a Whirlwind and Lincoln Labs spinoff) — provided vehicles to think about how individual small computers would trade off with information utilities (which were very inefficient for the user-interface tasks, especially for graphical interaction). Early on, small computers were used both as standalone graphics machines and as graphical terminals.

Doug Engelbart and the Augmentation Research Center (an ARPA project at SRI) gave what is now called the “mother of all demos” (it pretty much was!) 50 years ago this year. It used a time-sharing computer, but it was an archetype of how the research community thought about “personal computing” back then (I think of what he showed as “personal computing”).

image

Bill English — co-leader of ARC — at an NLS terminal with black on white screen, mouse and keypad. There was a large debate within the ARC community about whether the next version of the system (in the early 70s) should be done again on a time-sharing system, or should be done on networked individual computers with a shared file facility. This eventually caused a schism, and a number of ARCers — including Bill and another central figure Jeff Rulifson — wound up moving to Parc (which was in the individual networked computers camp).

Butler Lampson has pointed out that Parc in the 70s was pretty much the only place that was trying to render the “ARPA Personal Computing Experience” idea on networked machines each of which belonged to a single person (part of the network also included shared laser printers, next level file servers, and connections to the ARPAnet, and then Parc’s own “internet”). And quite a few in the field did think this was a crazy research topic, especially those who believed that there were economies of scale in sharing mainframes (however, not if you valued user-experience). From the Parc standpoint, it was a very simple compelling idea in the world that Parc lived in, namely, the future 10–15 years out. We thought of what we were doing as simply building as much of that future early enough (a) for thousands of people to use and experience, and (b) to allow the much more difficult tasks of software and especially user interface design to have time to be invented.

As I’ve mentioned before in Quora, the key to the success at Parc was to be able to control, invent, and make the entire designs of hardware and software with the resulting stuff just barely working — but working well enough. Less synergy would have doomed most of the goals.

I’ll leave the story here with these tradeoffs between individuals and utilities in mind — these still exist, and are still not done well enough.

Was anything meaningful lost for good when Xerox threw out their old PARC data disks?

I don’t think that anything technical was lost. But from the standpoint of actual historical artifacts: certainly. As far as we know, the Smalltalk-78 image that we brought back to life a few years ago is the only extant version, and it was almost lost.

https://www.youtube.com/watch?v=AnrlSqtpOkw&t=135s

We didn’t generally recycle our disk packs, but just got fresh ones, so quite a bit of historical work was on the stash of disk packs (I left all of mine behind when I left Parc, etc.)

Who is working on developing the Dynabook today?

I don’t know specifically if anyone is working on the larger goals of the Dynabook. (When I first wrote this, I was as trapped in the physical design and the past, as anyone. So I missed pointing out that Bret Victor and his DynamicLand group have been working on something larger than the Dynabook, and which is very much in the whole ARPA/Parc spirit of “learning to think, communicate, and act qualitatively better with the intellectual and learning amplifiers that are possible with interactive computing”. Check out the DynamicLand website.)

image

A cartoon I drew in 1968 after a visit with Seymour Papert. The children have programmed their own version of SpaceWar, and are playing it using their Dynabooks (which are wirelessly networked, etc.)

There are certainly a number of people working on parts of the same visions and goals we had 50 years ago when the Dynabook idea appeared. I was asked to write a chapter about “What is a Dynabook” some years ago, and this will help the context here.

It is generally thought of as perhaps the earliest notion of a laptop or tablet personal computer, but the center of the idea were kinds of services and contextual influences that are a bit more subtle (and there were actually three physical notions about the device (a) tablet (b) head-mounted display (via Ivan Sutherland) (c) computers everywhere in the world (via Nicholas Negroponte).

Some of the contextual influences that influenced me back then were variously from Maria Montessori, Doug Engelbart, Seymour Papert, Jerome Bruner, Marshall McLuhan, etc. Papert’s work with Logo and children got my thinking to change radically from “personal computing as tools for adults” to “personal computing being like reading and writing”, thus children needed to start the journey into this new kind of “reading and writing” early in their life.

The “reading and writing” idea and children forces the computer to be regarded as “media” not just “tools”, and both of these ideas have contextual aspects (meaning that it’s not just what is written in a book that is important but what happens to a person when they get fluent in reading and writing and using this media to extend themselves). McLuhan, especially, had deep insights about this “We first shape tools and then they reshape us”, “The medium [itself] is the [big] message!”

Montessori said (paraphrased) that “You can’t teach the 20th century in a classroom” (meaning you have to embed the new epistemologies into the fabric of the culture around the children, and so a school has be a culture of these new ideas, not a lesson. Papert echoed these ideas by pointing out the difference between trying to learn French in a classroom vs living in France, and proposed that a “Mathland” that children could live in be manifested in a computer.

Both Papert — and especially Jerome Bruner — asserted (correctly) that children needed “intellectually honest” versions of the new ideas (in science, math, etc), and that most adult versions of these subjects are not suitable for the developmental state of children’s minds — what needed to be done is to invent children’s versions of these that would be real math and real science, etc.

And much more. In short, whenever you create technology, you are also creating a context that can deeply influence the course of thinking by the users (and this means that a technology is also a kind of curriculum). (These ideas were not at all thought about or known by the creators of Facebook, Twitter, etc, so they created disastrous “legal drugs” that most humans, especially children, have no defenses for.)

Many of the ARPA researchers in the 60s (especially Engelbart’s group) and Parc researchers in the 70s did understand the ideas about context and environment etc., and tried to use them to help what happens to users as they learn a new environment.

For the Dynabook work, we wanted to invent and embed ideas that would help children learn to think better in the fabric of the user experience (and we considered both the existence of a reading and writing medium to include programming, including of the system itself). Those who are interested in this might like to see the tribute I did for Ted Nelson a few years ago that used the Xerox Parc Smalltalk from 1978 (and which was the flavor of Smalltalk that Steve Jobs saw the next year).

https://www.youtube.com/watch?v=AnrlSqtpOkw&t=135s

The entire system was programmed in the very same language that the children used. This opened up many possibilities for learning and understanding down the road.

A number of good things were done subsequently — HyperCard, Boxer, StarLogo, Etoys (the ancestor of Scratch), etc. — but today — aside from Bret Victor’s DynamicLand — I’m not aware of any current projects that try to be comprehensive in the large about “computing as a medium for powerful ideas” etc. (I don’t have positive feelings about the current fad of “everyone should learn to program” because it’s really about “Why?” rather than “What?”.)

What's the best Donald Trump joke you have heard?

Donald Trump!

Isn't getting rid of the evil "state" like Haskell's approach, something every programmer should follow?

I wrote a Quora answer that touched on this question: Why is functional programming seen as the opposite of OOP rather than an addition to it?

Part of John McCarthy’s long ago insight (that after 60 years I wish computer people had grokked by now) is that you just have to model time along with everything else that is modeled. Then you wind up with parallel histories that both have the memory of things that we want and need and we also have the memories of when. This renders “mutable” and “immutable” moot (“mute?”) and besides the point.

John’s desire to have both unambiguous reasoning about situations and actions and causes and to have changes over time led to his powerful ways to deal with these. I think most of the confusions today stem from too limited a view on computing — very likely from taking programming practices as “computing” (whereas they are distressingly distant from each other at the present time).

How did Alan Kay first get started with computing?

I got started via carpet-bombing reading as a just turned teenager in the early 50s. I’m pretty sure that the catalyst wasn’t science fiction because although scifi was into robots it had very little to say about computers in those days (Robert Heinlein did mention Howard Aiken at Harvard in one of his books).

It was most likely something in Time magazine about “Giant Brains” that got me to go to the library and find a book with the title “Giant Brains, or Machines That Think” by Ed Berkeley, written just a few years earlier in 1949. This book was a popularization, but had a lot of detailed technical material in it (Berkeley had worked on Aiken’s Mark I computer at Harvard, and was an expert in making logic mechanisms from relays).

Included in this book was a description of a table-top relay machine kit that Berkeley had come up with — called “Simon” (you could buy the plans from him). There is a footnote that I didn’t notice at the time about how an 11 year kid named Ivan Sutherland had helped build the first version of this machine and had written a subroutine to do double precision arithmetic!

I didn’t have the money to buy the plans or even start fooling around with relays (even though they were easy to get on Courtlandt Street in NYC at the time). I was into vacuum tube electronics and high voltages.

A few years later, the next book I came across was “Faster Than Thought” by Britisher B.V. Bowden about computing in the UK, especially at Manchester. This described many different computers and how they were built.

About that time I was at Brooklyn Technical High School and the nearby technical college (Brooklyn Polytech) had made a Simon kit and let people play with it. It couldn’t do much, and the greatest pleasure must have been putting it together and getting it to work.

A few years later (ca 1960) my friend Chris Jeffers was working for United Airlines in Denver wiring plug boards for the punched card machines in their IT organization. I took a crack at a few things and thought it was fun.

I wound up getting drafted in 1961 and was able to get into the Air Force instead of the Army. One thing led to another, and while I was working in the path lab at a base hospital I heard about “a test no one could pass”. This was the programming aptitude test devised by IBM for the Air Force to screen for possible programmers. I liked to take tests, and for a lark took it and passed it.

This got me to San Antonio, Randolph Air Force Base, and Air Training Command. The initiation to programming was the intensive one week wall to wall course IBM had devised. At the end of it we knew the 1401 computer backwards and forwards and could program it.

We couldn’t design much. Back then the idea was to start programming other people’s designs so one was useful while gradually soaking up the more difficult systems design ideas and heuristics. This was done by implementing flowcharts, and in particular, flowcharts of punched card applications that the AF wanted to convert from the huge floors of rattling machines to run on small cheap computers like the 1401. The designers and makers of the flowcharts were called “programmers” and the human compilers of these into optimized machine codes were called “coders”.

The 1401 at Randolph was all of 8K bytes (6 bits of information + 1 bit “word mark”), 6 pretty fast tape drives, card reader, card punch, and pretty fast chain printer. Programs were on punched cards, most debugging was desk checking. Testing was to get the machine for about 3 minutes max in a day to run the program, have it not work and usually crash, get an operator to flip a few switches, look at a few memory locations, and get a core dump to peruse.

My best friend there was Rachon Andre Douglas from New Orleans, a former Russian linguist, who was a terrific programmer. I learned a lot from him, especially about using the computer to help program and run itself. Even with the tiny 1401 we built a micro-OS that could fit into miniscule amount of the top memory to control the running of jobs.

There was also a Burroughs 220 there — also a very idiosyncratic architecture — that I soon learned and started to program.

But that’s another story.

How would you design the perfect programming language?

“ ‘Better’ and ‘Perfect’ are the enemies of ‘What Is Needed’ “

And the latter is usually a qualitative step or more above “normal”. This means that a good heuristic is to try to “dim out” the present and “normal”. For example, we could try to forget about all the programming schemes and languages we know about.

This gets us to a simpler more difficult goal: to invent qualitatively more powerful ways to get computers to do what we wish.

To do this we have to try to understand what it means “to wish”. Can we make wishes in forms that are understandable in ways unambiguous enough to make progress?

To help our thinking processes, we can search for fruitful analogies and see if we can get them to yield some gold for us.

For example, we could look at the strengths and weaknesses of constitutional law, and ask what it would take to get a computer program to understand the law, and especially the meta-law in constitutions.

For another example, we could look at Biology, and especially at morphogenesis (developmental biology) to get better ideas of “converging processes” that create viable systems in the presence of many kinds of noise and random problems.

We can look at successful societies to see how distributed IQ, knowledge, and wisdom can work to do more than any isolated component. We can try to understand how negotiation and explanation help the whole system design and build better.

We can devise sample “wishes” to see what good wishes might look like — to start we should come up with 20–30 of these and play with them until we desperately want to make them work. Then we can think about what mechanisms would be required that we will have to invent.

And so forth, and lots more.

In other words, when very high visions and goals are called for, one has to do a lot of meta-stuff to somehow both escape the current context and also somehow use what has been learned without it getting in the way. I’ve advocated “Learn everything, and then forget it except for the perfume!”. Meaning: when you forget you open up to more things — more “scents” — you can close your eyes and ears and find your way through your nose. When something possibly interesting appears, its odor will remind you of something you might have run across in the past, and this will help you think about the new idea. This works a lot better than trying to extend the past into the future (you just get an overcooked version of the past, and worse, the past you use will be all too familiar!).

The simplest heuristic is to deal with “wonderful goodness” for as long as it takes before trying to figure out just what steps are needed to start realizing “what is needed”.

What piece of hardware in a computer matters the absolute most when trying to build a super fast computer?

Many of the other answers addressed a somewhat different question than the one given.

As Ivan Sutherland (the father of computer graphics and VR/AR etc) recently pointed out at his 80th birthday celebration at SIGGRAPH, in the end the limiting factor and most pernicious bottleneck are the wires. For the last several decades Ivan has been working on “self-timed logic” (meaning that there are no external clocks (and wires from them needed) and in “comingling processing and memory” (meaning there is not a separate module called “CPU” and others called “memory”).

All these help with getting rid of and limiting the size of wires. Also, this approach can help understand just what the “action components” should actually look like.

How do I learn programming if I hate complexity?

If you really “hate complexity”, then perhaps you should take a shot at learning programming (really systems design) to see if you can help. Even better if what you really hate is “complication” (In several talks, I’ve made the distinction between “complication” as “noise and mess introduced by humans” and “complexity” as something intrinsic to the phenomena.

The basic idea is that if “complexity” pays its way by being interesting and important, it is worth understanding, whereas “complications” should be weeded out by finding better ways to think about and represent ideas.

A good example outside of computing are Maxwell’s Equations, which Maxwell originally wrote in non-vector form and as (many) explicit partial differential equations — these were reduced to the “fab four” by Oliver Heaviside via introducing vectors and higher level operators (i.e. curl and div). In this case you have to learn something additional but you are rewarded by greater simplicity in the new context.

A parallel in computing is the move from many different idiosyncratic machine codes to designing “higher level languages” and then to design “very high level languages” (which could be statements of goals to be achieved by the system — so: “programming in terms of requirements”, the “whats” rather than the “hows”).

Much of real programming today should likely be in terms of meta-programming (like the above), but the “field” — such as it is — has preferred to try to scale up assemblies of weak building blocks directly. This has not been pretty or useful for the most part.

More people who “hate complexity” (“complications”) are needed!

What are Alan Kay’s Learning Labs?

A “Learning Lab” was a once a year (sometimes twice) for many years get-together of the most interesting people we knew to share ideas about learning, teaching, epistemology, art, etc., held away from the regular world.

They were usually three day gatherings done collaboratively with the Apple Hill Center For Chamber Music in New Hampshire where I’d spent a number of summers learning how to play chamber music (partly as prep for the pipe organ playing I had started to learn). The Center was on an old farm actually on “Apple Hill” outside of Keene out in the woods. The 1740 barn was the concert hall, there were lots of cabins and practice rooms. The motto of the place was “Great People, Great Music, Great Food, Live In A Shack”. When a second one was done in the same year, it was usually in California in a “place with vibes” (such as Ojai or Catalina).

Later we invited Mitchel Resnick’s “Lifelong Kindergarten” group at the MIT Media Lab to be a main part. Some of the guests and regulars included Tim Gallwey, Seymour Papert, Marvin Minsky, Betty Edwards, Doreen Nelson, Arthur Hull, Richard Dawkins, John Steinmetz, Eric Stumacher, etc.

I’m pretty sure I got the idea for these from having been invited in 1975 — I think by Stewart Brand — to be part of “The Mind-Body Colloquium” which was held at the Zen Center Monastery in Marin County, CA. Stewart formed the idea for this partly from his experience creating and running the Whole Earth Catalog, and then from his interview with and the article he wrote on Gregory Bateson. I think they cooked it up. Stewart knew the abbot of the Zen Center — Roshi Baker — and it was a perfect place for this meeting. Some of the guests included: Bateson, anthropologist Mary Catherine Bateson (his daughter with Margaret Mead), Gordon Pask, Heinz von Forester, Humberto Maturana, Francisco Ayala, Terry Winograd, etc.

Besides the discussions, we had a number of terrific learning experiences with the monks, the cooks (this was the ultimate in veggie cuisine (which later formed the famous San Francisco restaurant “Greens”).

Later at Apple, we started a large educational project — “The Vivarium” — and I started to think about boosters for our thinking. I had already gotten Tim Gallwey (“Inner Game of Tennis”) one of the greatest teachers I had ever met together with Eric Stumacher and John Steinmetz as birds of a feather. And we cooked up the Learning Lab idea (and Apple had the funding).

There were perhaps as many as 20 of these, so right now I’m at a loss to compactly describe how they functioned, but I will come back to this to add a few sketches that will give at least a gist of some of the activities.

What did both PARC and the “tonerheads” at Xerox misunderstand about each other?

Xerox had many different kinds of people; I don’t think of them as “tonerheads”.

And while I was there — 1970 to 1981 — Xerox went through a number of ups and downs — from being “the fastest growing company in the US” in the 60s, to having to write off about 20% of their annual revenues a few years later when the computer company they had bought — SDS (renamed XDS) — failed. And then to experience enormous pressure on their core business a few years after that when they were forced to divest their patents.

From CEOs who were visionary — Joe Wilson — to his protege Peter McColough (somewhat visionary) — to a very poor choice — “nothing but business” Archie McArdell — to Dave Kearns (who might have been visionary, but got distracted).

Part of the story is that — when no one believed in plain paper copying and Chester Carson’s xerographic process ideas, Joe Wilson and Haloid (later Xerox) bought up pretty much all of the patents and started adding their own. A good book to read is “My Years With Xerox: The Billions Nobody Wanted” by John Dessauer.

Possibly pressured by IBM who had refused to help in the 50s but then wanted in on the success 15 years later — the FTC forced Xerox to make its patents public in 1975. The Japanese were quicker to act and in a stronger way than IBM and Kodak: in just 4 years Xerox’s market share went from 100% to 14%. This was both serious, and also a huge distraction.

Some of the people within Xerox — like physicist Jack Goldman who was the Chief Scientist — understood the implications of Moore’s Law and some of the implications of computing. This led him to set up Parc in the first place to have labs for what he thought would be the key leverage points for Xerox’s future: Physics, Optics, Computing, and Systems. Others in the company were not “digital thinkers”, and others were primarily business types who just wanted to make money safely, and who had moved in as the company went from a super-risk super-technical company in the 50s and early 60s to a cash cow.

It wasn’t that the Parc scientists were hopelessly naive, but because of both the optimism and success and the nature of process that are part and parcel of science, it requires an enormous amount of effort on the part of a scientist or engineer to imagine that the majority of human beings don’t have similar contexts, and even more so that they won’t do something just because it is a good idea. Normal human decision making is much more like a mass mutual agreement on something that may have no particular rational or scientific basic — i.e. and e.g. “fads”.

Part of the idea of building lots of Altos, Ethernets, PUP nodes, laser printers, etc., was to make a “living lab” to vet the ideas, and other part was to show Xerox something that was much more than a demo of a technology. We thought that this would be extremely convincing. In fact — for a variety of interesting reasons — it wasn’t at all. The big Boca Raton “exhibition” of the Parc technologies for the Xerox executives fell flat. In fact, their wives could see what it was much better than the execs. One of many interesting observations was that the execs couldn’t imagine doing any work themselves on an Alto. They could see some possibilities for secretaries, but missed what might be called the “Engelbart Context” about collaborative work and planning, and the “Lick and Taylor” context of “Computers As A Communications Device” (a 1968 paper they wrote).

The big misunderstandings were almost entirely about what human misunderstandings are usually about: human beings have a “private universe” in their heads, and communication requires some agreement on some of the contexts and world views of the private universes.

The big tragedy was not that Xerox missed what Parc did, but that the government funding that led to the large concepts of ARPA/Parc was discontinued in the early 70s by Congress (again from a lack of understanding). Parc was able to do the next stage of the ARPA ideas, but no commercial company could handle the ideas: not Xerox, not Apple, not Microsoft. That continues to be a tragedy of the first magnitude.

What was it like to be at Xerox PARC when Bill Gates visited?

Did he visit?

To my knowledge, he didn’t while I was there (1970–1981).

Austin wrote me to say that he had gotten the idea from this Gates quote: A Rich Neighbor Named Xerox — “Well, Steve, I think there's more than one way of looking at it. I think it's more like we both had this rich neighbor named Xerox and I broke into his house to steal the TV set and found out that you had already stolen it." Bill neglected to mention that he was a few years late to participating in his analogy to a burglary.

However, most of us who had been at Parc did not think of it as a burglary, and wanted to share the ideas, but the whole ideas. I once said to Bill “Bill, we want you take the ideas, but please take the whole idea, don’t make a grotesque caricature of a misunderstood subset”. He did not look happy about this. Steve and Apple had much the same problem with subsetting.

Part of the difficulty was that neither company was willing to deal with the whole problem of personal computing and networking, which requires both hardware and software to be designed and built. Parc had been able to do a lot 10 whole years earlier because it went after what it considered the main problems and created custom hardware and software working together to make systems that would do the job. In the 80s the Intel and Motorola CPUs were not sufficient to handle the computing that needed to be done, but both companies backed off and only did things that these chips could do (often with enormous amounts of code — for example, the screen painting primitive at Parc was about 300 instructions of Alto microcode, but became almost 1000 separate subroutines for the Mac’s QuickDraw in 68000 machine code).

It took many years (decades) before these companies — always with immense resources compared to Parc — were willing to dip their toes into whole systems design.

It’s not clear that either company actually ever understood personal computing and networking at the corporate level.

How did Alan Kay read 20k books?

What I actually said was “I’m pretty sure I haven’t read more than 20,000 books”.

I didn’t count them. But I started reading quite early (~ 3 or so) and got fluent pretty quickly, and would be termed “voracious” from age 5 on. I’m 78, so if I read a book a day for 73 years, that would be over 26,000 books. I’m sure I haven’t read that many.

My guess is around 16,000 or so in the 73 years I’ve been reading “at scale”. By the third grade I had been reading quickly and with understanding, but I realized that I’d have to read books again if I didn’t remember them better over the long term.

Trying to explain “remembering well” is more difficult than “reading fluently” — both are most definitely skills (like the skill parts of sports or music) — so I will just look at the latter here.

Basically, skill learning is getting your “System 1”* to pay attention to the things you want to get fluent at. This system is almost invisible to our conscious mind and is hard to boss around (it actually bosses us around a lot more than is good for us). Tim Gallwey used to say “The parts of your mind you need to have learn don’t understand English!”.

Repetition with good focus helps (I also think getting started really early on many “System 1” learnings really helps).

One thing I found out was that trying to understand (i.e. “System 2”) while reading was not always a good strategy — on the other hand, being relaxed and “somewhat cognizant” would have big benefits hours later and especially the next day. I learned to sight-read music later in life, and got to look at the process more closely than I did at reading words when I was 5 and 6. Basically, you are trying to “grok” what’s important in the real-time phase of it, and then to let your memories and internal machinery chew on this for hours or overnight. Both kinds of reading are primarily about a kind of short-term memory for the real-time and then “allowing” the short-term memories to be retained.

From the standpoint of today’s still limited knowledge about how the mind works, it seems that we remember a lot more than we can retrieve — so learning how to recall what you’ve been through once is one of the keys. This “recall” will often be more or less invisible, but it can be fostered.

Well, I have strayed into “remembering” after all. One of the things that helped me a lot was one of the fallouts from doing lots of reading — I started to have many more things to make analogies with, and the analogies made remembering everything much easier. A side benefit is that analogies are also one of the best ways to have ideas (as Koestler’s “Act Of Creation” explains).

So I think the answer to the question is “less than 20,000”, “very much like other high level skill learning”, and “curiosity as a calling”.

---

Adding a few more thoughts: the cliches are quite true about “the thresholds where you realize just how ignorant and limited you are via education, and especially from wide scale reading”. These can be quite daunting. The US Library of Congress lists “More than 38 million books and other printed materials, 3.6 million recordings, 14 million photographs, 5.5 million maps, 8.1 million pieces of sheet music and 70 million manuscripts, 5,711 incunabula, and 122,810,430 items in the nonclassified (special) collections: more than 167,000,000 total items[1]”.

So 16,000/38,000,000 = 0.0004 = 0.04% ! One hopes (perversely) that not all the 38 million books are worth reading!

More importantly, it is likely that there are more worthwhile books about than a human could possibly read in a lifetime. This means there are very likely worthwhile points of view — even critical points of view — that one hasn’t encountered yet. This is a nice parallel — emotional as well — to the delights and dilemmas of doing and thinking science: you can’t wind up with the kind of blind certainty and belief that our genetically built brains crave and our cultures pander to. But by embracing a kind of uncertainty and judgement one can see many things more clearly.

It is also a parallel to why cooperation is a better strategy and tactic than competition in science, engineering, politics, and in life. And why arguing should be about advancing enlightenment instead of trying to win.

I also forgot to remark about the usefulness of “not so great books”. These provide an important context for “what people want to communicate”, and great help the appreciation of the great books. Fluency in reading helps a lot, so it doesn’t cost much in time to read a “not so great book”.

---

see Kahneman’s book “Thinking: Fast and Slow”

What was Alan Kay's experience like working at Apple?

There were many Apples in the period I was there (from about May 1984 to about Sept 1996). This was not just the turnover in Chairmen and CEOs (from Mike Markula, to Steve Jobs to John Sculley to Mike Spindler to Gil Amelio, etc.) but also having the BoD throw out Steve in 1985 made a big difference.

My opinion of the BoD was very low. They consistently passed up deep opportunities to grasp and make the future of personal computing and networking. For brevity, I will leave these out here.

Also, very especially in Apple, just who was COO really made a difference (main job is to make things happen that are supposed to happen and vice versa).

And who was head of product and marketing (e.g. the personality of Jean-Louis Gassee had quite an influence both positive and negative).

And Apple Software when it was directly part of Apple and then spun to the side as Claris.

Some of the high points revolved around Steve for the year we overlapped, and then John Sculley. The latter was willing to fund for multiple years a big education project I’d thought up (the “Vivarium”), and was the big backer of Hypercard when Apple Marketing didn’t want to have anything to do with it.

When John was forced out, there was no one left to really back “what’s next?”. For example, Hypercard was (a) a great start (b) needed a brand new version (c) needed to be made the basis of an Apple WWW browser (think about how wonderful this would have been!) The old Parc hands (who had also worked to help invent the Internet) put a lot of effort into trying to convince management to have Apple software be the main source of high quality wide spectrum authoring for the web — but to no avail.

The Newton was a good idea to do, but Apple Marketing forbade important features, fearing that it would then hurt the Mac.

Etc. Etc.

So there was quite a downward spiral happening after Steve and Sculley. I and my group left in Sept 1996 as Gil Amelio came on board (as far as Gary Starkweather and I could tell, he was both fearful (literally) and didn’t have a clue about what Apple was and could be about.

An important point — I think — is that when Steve came back he was able to start a new direction for the company — kind of “consumer conveniences” — but he was no longer interested in the ideals of personal computing and “Wheels for the Mind”. This was a shame (but people who see no higher than money loved it).

Despite all this, he and I remained friends.

How can I practice to play a classical guitar tremolo faster?

Lots of good advice in some of the other answers. However, perhaps we can ask a larger question: how can I practice to play a beautiful classical guitar tremolo?

In other words: Technique should be the Servant of Art, not the Master.

The quintessential tremolo piece is Tarrega’s “Recuerdos de la Alhambra” (Memories of the Alhambra). It is written as though for a solo instrument and accompanying instruments, and rather “orchestral” in nature.

The Alhambra is an otherworldly breathtaking “castle” in Spain built at various times by Muslim, Christian, and Jewish artists, and is now partially in ruins.

image

We don’t have to have an explicit “program” to latch onto the wistful moods of the A section in minor — could be “remembrances in moonlight”? — and the B section in major — “in the sun”?, or “now I really remember this wonderful place vividly”? The second A section — “Oh, it’s just a memory”?

If the solo voice were a violin or a flute or oboe, etc., we could imagine how a great musician might play it: there would be a fluid pulse, many changes in dynamics, and some in timbre. Here is what the score of the A section would look like to a violinist or flautist (leaving out a few little ornaments and showing it as it sounds):

image

The central idea here is that there is nothing about this piece that concerns the call for tremolo as “lots of notes” on guitar: Tarrega wants the player to create an instrument that can express the solo voice of this piece. Since the guitar doesn’t sustain well, and a note cannot be crescendoed, the tremolo will allow a “note” to be sustained and have variable dynamics — there is a kind of ebb and flow of dynamics indicated in the score.

This is very much in keeping with what classical romantic period guitar is all about: to make a wide range of expressive gestures that evoke cellos, brass, percussion, flutes, and, here, something perhaps like a violin or an alto flute.

Or if you wish: a new instrument that we can call “angel’s wings”.

Today we have the wonderful resources of YouTube, which are invaluable to any student of any music. We can try to find examples of “tremolo as instrument” rather than “tremolo as ‘speed’ “ or “tremolo as ‘notes’ “.

My favorite example of a guitarist transcending the instrument to achieve this is Enno Voorhorst’s version (which has been viewed by almost 350,000 people).

https://www.youtube.com/watch?v=viVl-G4lFQ4

Please use good headphones to really hear what is going on (and also ignore the “atmospherics” of the green screen background “in the snow”). It’s also worthwhile to go to full screen to get a sense of just how relaxed he is playing this piece.

Having a good sonic image of “the music to express” really helps when working much more slowly on the techniques needed. One suggestion here is to not worry about the speed of the the tremolo (or piece — they are tied together) and to instead concentrate on the balance between solo and accompaniment and then include the changes in dynamics — the louder and softer of the solo line — while keeping the background at roughly the same dynamic.

You will start to notice that Enno is not playing this at a very fast tempo — the tempo is just right — and the key to the sound is just how evenly and lightly he is playing the 32nd notes — there is nothing to remind you that they are individual sounds — the actual notes in the solo are made from all the 32nd flutters of the same pitch — and they quickly make a whole timbre of the “angel’s wings” instrument which maintains perfect line throughout the piece. You can see that his right hand is hardly moving and that he is playing the tremolo on top of the strings rather than digging into them — yet he’s able to completely control the dynamics of this line (magic and art!). The illusion here is wonderful, and fits in with the “remembrances as illusions” feeling.

This is a key to how to practice. It will take some months regardless — here you can concentrate on expression and evenness and lightness and dynamics and music and just keep everything slow, and get great enjoyment while learning.

The first few bars of this piece are very good for learning tremolo playing (one of many hints is to learn tremolo on an inner string — Tarrega plays the first E on the B string — this will help keep the right hand fingers from flying open — and playing slowly will help even more).

It’s not a competition, sport, war, body-building, or a novelty act: it’s music, it’s art.

Is there any programmer with 50 years of experience?

I’ll nominate several of my heroes:

Maurice Wilkes, was the architect and builder and programmer of one of the very first stored program computers — EDSAC ca 1948–9 (which in fact is now being re-created at the UK National Computer Museum in Bletchley Park).

Wilkes lived a long productive life, and was very active right until he passed away in 2010. He was the second Turing Award winner, and received it for hardware, midware (he was the inventor of microcode), and software.

In his reflections, he recalls starting to walk up the stairs in Cambridge, and was stopped by the thought that “I was going to have to spend most of the rest of my life fixing bugs in programs”.

I think we can safely think of Wilkes as having spent 60 years as a programmer and designer.

Ivan Sutherland just celebrated his 80th birthday this year, and was feted at this year’s SIGGRAPH. Both Ivan and his brother Bert were very involved in engineering as young children (their father was a civil engineer), and they both met Claude Shannon early on. Ivan at age 12 or so worked with Ed Berkeley on SIMON, a relay computer kit that Berkeley had come up with. In his 1950 book “Giant Brains”, Berkeley credits Ivan with writing a program for SIMON to do double precision arithmetic.

That was 68 years ago.

And, yes, I started programming as a real job ca 1961–2 in the Air Force, so I’ve been at it for about 56 years now (I don’t count the paper stuff I did before, because until you get to debug, especially on a machine, “programming” is really a kind of “planning” or “intending”.

Most of the programming I’ve done for the last many decades is “meta-programming” ***** — which is still programming, but a bit different from what most people think of when they hear the term.

---

***** I was just asked about “meta-programming” but the question disappeared. Here is my reply: Well, there is the Wikipedia definition (the examples are a bit better than the definition) — but even though Wikipedia has its faults, I always look there first. So take a look at that. Metaprogramming - Wikipedia

I would change the emphasis in that article to be more on design than writing code (this is a good heuristic in general). Also the article does not mention “extensible languages” as a key example of meta-programming and meta-design (the treatment of DSLs is weak etc)

You can also get some clues from “meta”. In any kind of programming, one represents ideas and manipulates the representations. In “meta-programming” the representations are of programs and the stuff of programs and processes.

What does Alan Kay think of structural editors?

I haven’t tried a structural editor for so many decades that I don’t think I can comment on what is around today.

Also, I think that liking these or not has partly to do with temperament and style. For example, I turn off the grammar checking in my word processors when I’m trying to write English, because I want to use all the brain cells I have for “flow” and “content” intermingled. Similarly, for programming, I want to use my limited mental facilities for design and bottom-up experiments.

We did a very early version of “tile-based” (“block-based) scripting in Etoys (later used for Scratch) and this retained some of the bottom-up feeling while making a method.

Another related topic is “mathematical expression builders”.

On the other hand, I think Google’s attempts to anticipate what you are trying to search for are generally beneficial.

Here is the rub (I think). A big question for all systems that know about structure and might have ways to more easily build the structure is what does it take to change your mind and make something different in part or all of it? I’ve advocated treating what’s there as “flat” and being able to use a stylus to isolate and scrub the parts to be changed, etc., so there is not such a feeling of great burden. I haven’t seen anything like this (but there could be — it will be interesting to hear from readers just what is around now).

There are other related comments here. For example, when color helps and when it gets in the way (similarly, other structural delineators). In all systems — whether tiles or not — one would like to “show structure” for some purposes, but also be able to “show flat” for others (for example gisting).

It’s likely that the answer — still — for today has to do with programming needing many points of view quickly and easily supplied by its UI — and I’d like to hear of any systems that really do a good job here.

What are some ways that programming was better in the past?

This is the 19th answer — from someone who started programming in the early 60s, and had a sense of how things were in the 50s.

I expected to see in the other answers at least one that pointed out that there was always a range of “computer aided” in how programming was done, and that this range very much depended on particular cultures of computing, often that coalesced around particular physical computers.

To take the 50s for example, during a time when most people were “factory hands” with one or two batch times a day, with debugging being looking at octal core dumps, there were some who made interactive debuggers that vastly changed how development was done (check out the traditions of Whirlwind into DEC, etc.).

Similarly, there were others who started thinking about having the computer do as much work as possible with the programming task: they started to invent and build higher level assemblers and algebraic languages.

Some went cosmic (LISP was invented in 1958!)

And others didn’t want a factory production line model, so started to find ways to share “time-quanta” of the computer so that 10s and sometimes 100s could interactively program and debug.

Meanwhile, most programmers were doing no more than coping with what was there (and doing a lot of work because of this).

Almost 60 years later, I see the very same picture. Most of the programming I’m aware of is done with little more technology than slightly enhanced punch cards and “system builds” as isolated events. It is disheartening to hear what most organizations think of as “agile” programming!

Yikes!

And still there are a few people — as there are in every generation — who are actually trying to do more than cope. And they occasionally advance the level of art a bit, while most of the others ignore the possibilites.

Did Bell Labs and ARPA/PARC ever collaborate?

In the 60s there weren’t a lot of computer researchers, and they were generally friendly. So more or less everyone knew everyone. If you look at the history of Unix, you will see cross fertilizations both from the MIT Multics research and from Butler Lampson and Project Genie at Berkeley.

BCPL by Martin Richards from the UK was very influential, and spawned implementations and adaptations at MIT, Lincoln Labs, etc., and Bell Labs (where it became “B” and then morphed into “C”). A similar process happed at CMU, where the morphed language was called Bliss (Bill’s Language for Implementing Systems Software — Bill Wulf).

One of the most influential early language systems in the 60s from Bell Labs was SNOBOL by Dave Farber, Ralph Griswold and Ivan Polonsky.

“Extensible Languages” were a big topic in the 60s, and Bell Labs contributed a few.

Doug McIlroy was a fountain of ideas there, and contributed an extensible macro system, the Unix “pipes” ideas, and many more.

Perhaps the biggest difference between the two organizations in the 60s is that ARPA had a really cosmic vision about computing and networking — involving many topics, including interactive graphics and simulations, and world-wide networks and collaboration — and Bell Labs didn’t. I think ARPA was greatly helped to have about 3/4 of the projects in universities.

In the 70s, I think Unix, C, the DEC PDP-11, and their tools created a center of gravity that influenced most of the thinking at Bell Labs. Parc was bent on making a complete first pass on the cosmic ARPA vision, and this required it to make its own computers and displays and networks rather than buying from vendors, and to do all other parts of the hardware/software as needed — and to collaborate strongly with the ARPA researchers involved with “internetworking”.

How did Xerox PARC do engineering requirements management?

I was one of the computing researchers at Parc (two of the four labs were devoted to this: Computer Science Lab and Systems Science Lab — the other two were devoted to optics and physics) from a few months after it started in July 1970 to 1981.

I never heard the phrase “engineering requirements management”, and I’ll confess I’m not sure exactly what it means (it is perhaps guessable).

Since Bob Taylor did not want to have “management”, this could even be a moot question.

However, we did have a number of really great engineers — like Chuck Thacker, who received the ACM Systems Software and Turing Awards for his work. Or Dan Ingalls in software, who received the Grace Hopper and the Systems Software Awards for his.

We also had a number of people who might be better considered “scientists” (especially from temperament), but who could put on an “engineering hat” when needed. This “hat” combined with “deeper realer engineers” often produced excellent results (my favorite is Butler Lampson designing the very complex micro-coded CPU for the MAXC mainframe that we built in the first year of Parc). My view was “Well, Butler is amazing, but Holy Shit!”.

One way to think about Engineering is that it combines “getting something to work” with “in a principled careful enduring way”. This requires toughness along a number of dimensions, including with process.

So: things are being “carefully managed” even in a larger system that has eschewed as much of “formal management” as possible.

In practice, the general aim of the computerists at Parc was to realize “the ARPA Dream” of pervasive personal computing internetworked worldwide with artifacts that could be and were replicated at the level of 100s to 1000s. This required a lot more engineering than just making a demo of a great idea.

Why do people see the fretlessness of the violin family as an obstacle? They miss notes just as much in fretted instruments.

An additional wrinkle not covered in the earlier answers is the question of playing polyphony with “as many voices as the piece demands”.

There are somewhat substantiated theories that the flatter fingerboard and bridge of the older bowed strings — before the violin family was converted ca 1800 to being able to put out a lot more volume — plus perhaps variable tension control in the “German-style” bows with slacker hair that could be tightened and loosened with the right hand to get more or fewer strings — allowed at least three voices — maybe four — to be played together without arpeggiating.

For this kind of music, the fretted viols help getting the pitches of the multiple voices “good enough”, whereas the unfretted violin family presents really great difficulties when going beyond double stops or using open strings to help.

I recently took up the lute, and found that the moveable frets really help to get the important major 3rds and 10ths in a particular piece in excellent tune. I wish I had them on guitar (baroque guitars have them but with with less than 6 strings). I also have a vihuela with 6 courses of strings and gut frets. Changing the tuning of the third course will provide the guitar string relationships, so one can experiment with “better tunings for (some) guitar music”. This is quite interesting!

One of the main reasons given for the fixed metal frets for guitars from the 19th century onward — besides fixing on equal temperament — is much longer sustain. This is certainly the case.

But, as all guitar players know, if you are sensitive to tuning, the equal tempered 3rds (and 10ths) are between annoying to painful). (There have been experiments over the years with “microtonal” guitars with moveable metal frets…)

And equal temperament is not as offensive with the more mellow sound one gets from “no nails”, something I’ve just recently been able to hear comparisons of good players with and without.

“No nails” is the mainstream for lute playing (with perhaps some nails used for loud continuo playing on theorbos). I’ve found that it creates a much wider richer palette of color than using nails. “No nails” on 19th century guitars was also more normal, and this also brings an amazing richness of expression.

Here’s a YouTube video of the wonderful Anna Kowalska playing a replica of an ca. 1820 French “parlor” guitar (after Lete). She has tuned it to 415, and is using Savarez low tension strings (which she says are “the right balance between good and evil” compared to gut). See what you think!

https://youtu.be/rX_Hm71qnCk

What is the first IDE you remember using?

I think that the first UI I used that could carry out interactive development and intermingled live code and live debugging was the CAL system for the Project Genie Time Sharing System (which became the SDS-940, and was used by Engelbart for the “big demo”). CAL was ca 1965–66, and primarily made by Butler Lampson while at Berkeley.

CAL was an interesting system itself, in that it was an attempt to make a very JOSS-like end-user system, but (a) within the limitations of Model 33 teletypes, and (b) to have it be able to do instant real incremental compiling. The result was not as pretty as JOSS, but was as easy to use, and the incremental compilation allowed some serious programs to be developed on this very early small but consequential machine.

JOSS was the first great end-user programming system. It predated — and was better than — Dartmouth Basic (which wound up using a few JOSS features). It was primarily the work of one remarkable person — Cliff Shaw — who also wrote the first great paper on user interface design. This was a HW/SW design (where great care was taken to design and build every part of the system, including the interactive typewriters used as terminals).

JOSS was followed by two important systems. JOSS II was a general facility for RAND, and again with specially designed terminals — a truly beautiful system: one felt a pleasant quiver just from the knowledge that one was going to be able to use it that day. The second landmark system at RAND was GRAIL (GRaphical Input Language), which required the first great tablet-and-stylus to be invented (in 1964 — the year of the mouse) and the first great gesture recognizer to be invented and made.

Do I lose something when learning OOP in Ruby instead of C#/Java?

Here’s what I think is a good analogy, but may not have universal reach. After a layoff* of about 4 years from guitar I recently started to learn lute: also with strings and frets and played with most of the fingers on both hands. But: the lute has mostly double strings, it is tuned differently, the notations are different, etc., the hand positions are different (the right hand very different), the frets are made of gut and very low, and there are many other idiosyncrasies that differ from the also many oddities of the guitar.

This continues to be an interesting process. The thing that transferred the most and is helping the most is “what music is all about”: how melodies, harmonies, counterpoint, rhythm, phrasing, “line”, etc. work, being able to hear internally what the music is, and so forth. Then the work and fun becomes how to get the music through the instrument to add its “special stuff” to what finally leaps into the air. New skills have to be learned, etc. Some of them will take months and years.

Similarly, my take on programming something is in terms of “its music”, and for me this is always in the form of and in terms of a dynamic systems design, with different parts of the system interacting with each other, and where finding/identifying the parts relationships and interactions is the critical part of the design. In the end it will be the systems design that will either help or hinder the actual programming, the debugging, and especially the down the road maintenance and inevitable changes. It is worth trying to do a good job here!

Then the question becomes: what does it take to make that system with the tools at hand? This usually will require some new tools to be made from the resources available. We know we can do this, because the resources we use are Turing complete enough to allow us to make just about anything we need and want.

This gets us to the question: how much work is this going to be? And different programming systems will have different loads depending on the problems. Some of the loads might be expressive, some might require high efficiency.

Because we can’t discount the design requirements and possible learning curves we might need to get on, we could even wind up working out the design with models made in one language, and then switching to another language to carry out some of the pragmatic needs. An early 60s stellar example was the Randall and Russell Algol-60 system on the KDF-9 computer — they made two semantically equivalent systems: one a dynamic interpreter for interactive debugging, and the other a grinding optimizing compiler. The Brooklyn Union Gas company system of the late 80s was designed and prototyped on workstations using a dynamic OOP language, and the semantics of this were rendered — via PL/1 ! — on their IBM 3090 mainframes for the day to day deployment.

So: my advice is try to avoid “learning OOP” via a language. “Real OOP” is not about making abstract data types, setters, or most of the things that are typically done in them. “Real OOP” is about dynamic systems, and the way to learn it is to start designing and building them and gradually build up a feeling for scaling, modularization, intercommunications, etc. Then you will be able to choose how to use the (too many) degrees of freedom available in almost all programming languages.

—————-

*I had to give up guitar playing about 4 years ago because the bracing stress on my left thumb had broken down the base joint beyond repair. The lute has the possibility of not having to use thumb bracing very much (and this is working so far).

Why does Smalltalk not become a commonly used platform for learning presentations and video lecture constructors? Squeak looks great as an interactive presentation platform for college lecturers.

Commenting on Eliot Miranda’s answer.

Hi Eliot

Squeak Smalltalk was largely made so we could do Etoys for children: an ancestor and superset of the later Scratch.

From the child’s point of view I wanted it to be an “object media environment” with drag-and-drop scripting and massively parallel execution. The design drew from the underlying Smalltalk, Logo, StarLogo, and a powerful superset of HyperCard, in which Smalltalk’s multiple “desktops” acted as the “cards” and could be used as “slides”. This allowed a desktop/card/slide to exhibit anything, including work in progress, dynamic content of all kinds including simulations, animations, videos, sound and sound synthesis, 2D and 3D dynamics, collaboration between other Etoys systems on the Internet, etc.

This worked both as a general authoring environment, and as a superset of a presentation system such as PowerPoint because we could use the children’s scripting to easily control all the dynamics we desired, including sequencing within each “slide”, going to next slide, effects, etc.

A white paper about this system is here.

Here is a white paper about children’s use of this system to learn science.

This was done 20 years ago in the late 90s and was perhaps at its prime in the early 2000s. I gave many talks using this system, and it was definitely the most convenient of all the ones I’ve used.

The graphics — basically Morphic — in Squeak Smalltalk was of PowerPoint quality in those days, but — for lack of funds — was not improved to keep pace with the rendering engines used by MS et al. At some point PowerPoint started to look better while still doing a lot less.

In the STEPS project ca 2008 Dan Amelang did the Nile DSL which was able to get state of the art quality rendering in an amazing 500 lines of “executable math” that was also fast enough to be useful for most presentation purposes. I did many talks with the STEPS system also, but these did not involve Etoys.

I’ve been amazed that no Squeak Smalltalk people have taken the Nile system and replaced the old Smalltalk rendering with it.

Dan Ingalls et al took a pass putting the Morphic model over JS in the browser (in the “Lively Kernel” system). This has very high quality graphics, but they didn’t carry through with a real end-user framework (like Hypercard or Etoys) that would allow it to be easily used in the manner of Squeak Smalltalk).

So, unless I’m mistaken, I don’t think that today there is an end-user authoring system with state of the art graphics and media and with the expressive power of the Squeak/Etoys Smalltalk system of ca 15 years ago.

Note: today I think the way to do this would be to use JS as an assembly language and build an entire system over the top of it, and especially to use the embedding features available to allow external media to be presented inside the system. For fun Dan Ingalls, Bert Freudenberg et al did this with the Smalltalk from 1978 — the version that was shown to Steve Jobs the next year in 1979. I used it in the same way as recounted above to do the presentation for a tribute for Ted Nelson.

Why don't more classical musicians improvise?

Lots of very different answers here, partly depending on where in the last 600 or so years one looks. Here I’ll just mention some of the parts of early music that require improvisation, especially continuo (playing “accompaniment” on a chordal instrument such as a lute or harpsichord), and especially 17th century continuo.

The most often found form to work from is a bass line with a notation for harmony but no further notes. If you are lucky to be the only other player besides the soloist (or another bass instrument doubling the bass line), you can also make up the harmony as well as the notes to express the harmony. Depending on the skill of the performer and the style of the piece, the “realization” can be anything from “just chords and bass” to complete contrapuntal parts with new themes.

Also, a lot of early music was written as “outlines” which could be played straight, but which most performers would not just ornament, but play “doubles” which would be further whole choruses with parts or all improvised. Examples abound here (e.g. Dowland’s lute works and songs from the early 17th century, and Louis Couperin (“the uncle”) about 50 years later for harpsichord).

Finally, as mentioned in some of the other answers, in most previous periods of classical music, a performer was also expected to be able to improvise on the spot in most forms (often by having someone in the audience sing a few notes for a theme). This art lives on, but only in a small percentage of current day classical musicians (especially organists, and some pianists — for example look up some of Richard Grayson’s wonderful improvisations in concert on YouTube).

Does a computer that has more data/information (huge amount of data) weigh more than the same type and model computer that does not have any data on it?

There is an old joke — that I heard best told by Butler Lampson — about an aircraft loading engineer who was worried about “the extra weight of data” on punched cards. It was explained to him that a large amount of data actually weighed less because “the information is in the holes” !

What majors are best prepared for jobs that don't exist yet?

I like Vishnu’s answer. I had planned to sit this question out because I don’t think “job” per se is a big enough idea or goal (just as I don’t think “making money” per se is a good enough idea or goal for business).

I think the trick is to home in on what you love while trying to notice how doing it well could also help provide food, clothing, and shelter.

Pretty much everything I was interested in strongly enough to do deeper learning wound up helping me later in life — and the field I accidentally wound up in didn’t exist when I was born, and has changed many times over the last 80 years.

So: go for happiness for sure!

What's a uni-class?

My version of “object-oriented” from 1966 on revolved around the active entities in a working system — I thought of these as being “full fledged computers” on a network and communicating via messages (this paralleled and was inspired in part by ARPA’s plans to do this with physical computers). With “virtual computers” I realized that you could do the neat thing of making everything — including the virtual networking and the messages, etc., and the stuff all this was made from — with “virtual computers” (all the way down).

Other inspirations were Sketchpad, the first Simula, etc., and — a bit later — Lisp.

The ideas that seemed solid in this scheme were the “virtual computer” and non-command messages idea (also implying encapsulation).

There also needed to be a way to make instances of “ideas for virtual computers”. Sketchpad and Simula had similar ideas (that we’d call “classes” today). There’s a top-down-ness in classes that is a bit of a barrier to “tinkering around” to “find” good systems organizations.

The first Smalltalk (-72) — see the history — was inspired by several important ideas in Lisp. One of them was the relationship of a “function” and “apply” to what sending and receiving messages might be. Another was that some version of the four or so ways to think about practical closures produced an entity that could be like an object. There is also the relationship to the practicalities of evaluating a function to the storage structures needed.

The basic idea here is that one doesn’t want to make a big psychological or action commitment about “type” or “class” until you can find out what it really is that you are trying to do. Both the language and the user interface should help with this.

For example, in Squeak Etoys, every object could be thought of as a “uni-class” because it is a single thing that can be used like an object but can also be a factory for similar objects. The language Self used prototyping instead of classes, and this is similar. Henry Lieberman wrote a great paper about prototypes and delegation as a way to make similar things that could piggy back on others but also have their own characteristics. I wrote a few screeds at Parc calling for “exemplars” — something between a class and a prototype (or perhaps orthogonal).

If you can have lots of classes — and modern languages can — it is possible to use some of the existing mechanisms to make a more bottom up experimentation environment that allows a lot of flexibility, is still using real objects — not data structures with attached procedures — and that — in a late-bound language like Smalltalk — can then be “elevated” (“beatified”) into stronger abstractions as the systems design becomes more clear.

One of the reasons this is really important is that design is not running to the library to see what’s there. It is generally much better for the language and SDK to be good and flexible enough to allow “scratch programming” of a model of the system to be done easily and without pain. This will guide whether anything in the library will actually help matters. Trying to program the solution before making a model is really a bad practice.

I view the “instance producing” mechanisms and conventions in OOP as still being a not yet well enough solved problem — so it’s always good to try to get more people thinking about this.

What does Alan Kay think of this remarkable initiative?

I donated to this because I think some good might come from it. I’m not a fan of competitions — I prefer cooperations — so I think I would have at least cast this in terms of teams, which is also more realistic in what it takes to deal with most important issues and problems (Richard Eng reminds me in the comment below that this is cast in terms of teams — adding in the equivalent of what are sometimes called “friendlies” in soccer, or spring training games in baseball, will enrich the eventual competition).

Also, one of the points of this more than 40 years old language was that it contains the metadefinitions to make better programming languages. It was quite disappointing to see that most people merely programming in it when it was released, as opposed to building more expressive systems in it.

So I think I’d also try to nudge this in the direction of taking a few really important things that weren’t practical for us to do at Parc, and to at least extend the existing Smalltalk to do these. For example, we really wanted “Uni-classes” in the 70s that allowed more bottom up programming — Smalltalk-72 actually had almost this — but the later Smalltalks were organized around classes. Both of these worlds — and the world of prototypes and delegation — can co-exist, and a good team effort could be to find good ways — including in the UI — to do this.

In response to a few of the comments, I’m adding something more here about “prototypes” (which I’ve also discussed elsewhere). The basic desire and need is — after coming up with something useful — to be able to create more things like it. Almost as important is that the “something” gets “beatified” into a larger concept/abstraction that can help thinking. Note that a large part of the schemes for such things in OOP languages, especially dynamic languages, are to come up with something more expressive and useful than “types” (or, from another POV, to elevate the idea of “type” into something really worthwhile).

Classes are a pretty top-down way to do this, and I’ve never liked them because I think that design is aided by being able to make things before having to decide just exactly what they are. Prototypes (perhaps with delegation) are at another extreme (in some languages they just become the root of the links, in others they are partly copied and partly linked) — they’ve always felt “too wiggly” to me. One of the ideas we talked about at Parc was “Exemplars”, which were kind of “sample or ‘typical’ objects” that would have more expressive power than a class, and more stability than a prototype. For example, in Smalltalk “Class Document” is not very explanatory, but more like a framework. “Exemplar Document” would be instantiated to make a typical document, and could also serve as a framework. Note that a typical “part” in an exemplar would likely be different from a default or a “master version” (all these are probably needed).

I’ve not particularly liked any of the gestures at these problems over the years. I think there is much needed to do here, and a really good approach to what the abstractions should really look like, and how they get made, is yet to happen.

There are many other things that could be done. For example, the Etoys framework in Squeak Smalltalk worked really well for late elementary school — one of its offspring was Scratch — but could vastly be improved (and Scratch even more so). This would be a good project to look at: what would be a great starting version and UI of a Smalltalk like language for beginners?

What is one moonshot project (as consequential as the ARPANET) should the US government undertake today?

First, I think that “moonshot project” is not the metaphor you really want to use here: the actual “moonshot” was a stunt that set back real space travel by at least 25 years, and is now looking like more than 50. Also, the government — because of the poorly educated nature of the majority of Americans — is all too prone to “stunts” and other “gestures” which lack the substance and the legs to be worked out properly.

By contrast, the ARPA-Parc research was done under the radar of public (and government for quite a while) scrutiny: for 12–14 years (1962 to 74–76) to yield the big workable inventions for universal personal computing and networking. This was a very good process and use of (not a lot of) public funds.

A “grand challenge vision” today for the US that seems desperately paramount would be of a not too far future American citizenry who are above threshold with knowledge of their century and centuries past, and with the perspectives and skills to argue for illumination and progress rather than to try to win.

As my friend and colleague Vi Hart once beautifully wrote: “We must ensure that human wisdom exceeds human power”.

The first stage with visions is to see if they are at all doable, and this one is. The second stage — a very difficult stage in design — is to identify the “MacCready sweet spot”: the place that is above the qualitative threshold to be able to really turn the vision into goals and problems to be worked on, that is also the most reachable.

There are many issues to be contemplated — and these are outside of your question. But consider that most Americans don’t want to acquire qualitatively better education, and most teachers, especially in the elementary years cannot help with qualitatively better education.

In analogy to the ARPA computing initiatives, the first — no one wants personal computing or pervasive networking — was dealt with by not trying to get any kind of agreement, but simply trying to come up with the inventions to do them, and then see if they could spread (the former was easier than having the ideas spread unscathed — I think more education about them was needed).

For the second, I’ve written and spoken about what could be done in analogy to how the printing press changed much of Europe and then America by being able to take the semi-capture of great minds and thoughts through writing and then carpet bomb society via mass publishing (the very start of the industrial revolution and the current civilization was the inexpensive replication and distribution of ideas). This found enough auto-didacts and changeable teachers to make the difference through quite a few generations of general society.

I think partly what is needed in analogy to deal with the current difficulties is the next logical stage of “the book” as it could be using the computer medium to carry it. This is a larger discussion that can be had here, but consider that “a book” today could teach reading itself, could help the reading of it, could be much much more than just words and pictures — or movies and sounds — but instead an entire learning environment. This does not solve enough motivation problems to work, but could make the difference in society-enforced schooling by augmenting the adults who are trying to help — “helping the helpers”.

My vote for the greatest “invented curriculum” is Jerome Bruner’s “Man: A Course Of Study”, which was created in the 60s with lavish funding and great contributors. This was killed in just a few years by a combination of politicians and parents who used it to further their own political and ideological agendas. Today, it would be worse. A terrific book about this is by Peter Dow: “Schoolhouse Politics: Lessons From The Sputnik Era”. Dealing with the backlash against “real education” is ultimately the most critical “MacCready Sweetspot” issue.

In the end, most children try to learn what is in the environment around them, especially at home and school. To make a qualitative difference in education — and before both the societies of the world and the world itself collapses — the environment for children must be qualitatively better understood, and qualitatively better changed.

Why couldn't the management of Xerox see the goldmine that they had with inventions at Xerox PARC? Was it really about lacking money to capitalize? How about stock issuance?

Both Giordano Beretta’s and Chuck Brantingham’s answers are good, so I’ll just add a bit more context.

The main sparkplug of Xerox was Joe Wilson, a very special guy who turned over the reins to his protege Peter McColough in the late 60s, who in turn set up PARC as Xerox’s long range research center. There was a lot of vision in Wilson, and a fair amount in McColough. He wanted — rightly — to get Xerox into computing, but they were not diligent about the company — Scientific Data Systems — they chose to buy with their burgeoning billions from the success of the Xerox copiers in the 60s. Now as Xerox Data Systems, they were eventually forced to close and write off a billion dollars ca 1974–75. Parc also almost was undone by this (a long story about how it survived). McColough was removed as CEO, and many problems for the company followed. This is worth checking out.

The first time when enough had been done at Parc to really start thinking about commercializing the big inventions was ca 1976, and

(a) Xerox was in trouble from various quarters, including new competitors at both the high end (IBM and Kodak) and the low cost end that they had never gotten into (the Japanese)

(b) the state of the Moore’s Law revolution allowed personal machines to be made and sold, but at prices which required them to be highly valued (or with low prices for machines that couldn’t do enough in an office — e.g Apple started this year). However, several standalone workstation machines for office text editing using displays were starting to appear, and were being bought by companies: for example, the Vydec machine, which was about $15K. A large percentage of the cost of such machines was the memory, and this was the first technology to really be changed by Moore’s Law.

(c) Xerox really thought of itself as a copier company, not as an “information company” as McColough wanted, or a software company (which was the key to having future vision: SW + Moore’s Law = a very different kind of future).

At that time Xerox was a bit like England before the Magna Carta, with very powerful P & L divisions, which contended with each other. Parc, as a research center, was just an L center, and small to boot. One of the arguments was that a full-fledged general purpose personal computer would have pretty much the same components as a standalone display word processing machine, but could take on a much wider variety of tasks. The evidence was the Alto computer plus Bravo (now known as Microsoft Word), and numerous other apps that were used daily, the GUI, and a really interesting system, way ahead of its time: OfficeTalk — which was a prototype and model of an integrated office system in the manner of Smalltalk.

The Alto had been purposely done to provide a personal computer in the early 70s that would be like a “Moore’s Law” computer in the 1980s. This allowed the advanced SW — including the user interface frameworks — for individual users to be invented over some years of effort. This plan worked well. In the end Xerox could never wrap its head around Moore’s Law and its implications. (Note that IBM really couldn’t either — or they would never have farmed out the OS for the IBM PC in the early 80s — they never were able to seize this huge inflection point in computing.)

Xerox did recognize the laser printer after it was made (worth following up how Gary Starkweather was “banished” to Parc for pushing the idea in Rochester — we loved him!). But again they could only deal with the larger aspects of it for corporate publishing. However, they made billions from it, enough to pay for Parc well more than 100 times over.

The Japanese — e.g. Canon — pushed for an affordable desktop version, and this is what Apple eventually offered with the Mac.

What were lacking — pretty much everywhere outside of Parc — were large visions of the future. For example, Steve liked the GUI but didn’t understand the networking or how the programming was done. So the Ethernet eventually came out as an agreement between Xerox, DEC, and a newly formed company. Apple never understood the Internet despite being told over and over about it.

As another example, one of many things that really needed to be done to commercialize the Parc suite of HW-SW was to make special purpose silicon CPUs to be able to do what the Alto could do with higher level language processing. This did not happen. Intel made an attempt with the 432 but tried to do too many features. The most impressive special purpose chip done in the early days of the 80s was the 3D graphics GPU done by Jim Clark — used for the Silicon Graphics machines — and first worked out at the ARPA project at Utah, Stanford, and then Xerox Parc (using the Meade-Conway design and fab process done there).

Apple later had a chance to acquire Silicon Graphics — as the “7 years in the future Mac” — to get on that learning curve (another “Moore’s Law” curve that they didn’t understand — the workstations of today are the commodity personal computers of the future — but they passed up that chance).

So … “something” did happen in fits and starts and in different places. But it really wasn’t as well thought out as the integrations of hardware, software, user interfaces, and networking that was the hallmark of the ARPA-Parc research community. And we are paying for the willinilliness and “worse is better” craziness to this day.

My conclusion from being a part of this and watching it happen over decades is that most people, most companies, etc., are very uncomfortable — and unequipped — for large and accurate visions, and this included both Apple and Microsoft.

(a) They generally prefer to take their risks where they think they are low and where they think they know something, rather that to develop a background that will give them a stronger way to see into the exponentials that have been the hallmarks of our era.

(b) They are also uncomfortable with “baseball” rather than “golf”. In baseball you are doing well if you hit .350, because hitting a round fast thing with a round fast thing is very difficult. The 65% of the time you don’t get a hit is not “failure” in baseball but overhead for getting the .350. (Failure in baseball is failing in technique, such as not catching a flyball, you should do this successfully more than 98% of the time — the analogy here is that you should be successfully able to build any hardware or software system you design.) ARPA-Parc — via Licklider — was sure that given the vision and goals, that batting .350 or .400 would qualitatively change the world, and that the other 65% was “overhead” for qualitatively changing the world. This is what happened.

This idea really works — it’s another exponential — but businesses and most people don’t understand it at all, or well enough to have faith in it. Clay Christiansen’s book “The Innovator’s Dilemma” is not very good on a number of counts, but one of the worst aspects of it is that he excuses higher management for not doing well with new ideas — because they are new, because they could compete against the existing lines of business, etc. — but my view is “what else is higher management really being paid for if not for the knowledge and chops to deal with the future?” — and the context is that it is the 20th and 21st centuries that must be understood in order to have a chance at being part of and inventing the future. Higher management should not be excused for “flunking their century”!

What are some great stories from Xerox PARC?

There are quite a few online, and this would be appropriate because the ARPA-Parc research community put quite a bit of effort into inventing personal computing and pervasive networking so that people by themselves and with their friends and colleagues could find out important and interesting things online.

So: I first typed “Alan Kay History” into Google, and on the first page found “The Early History of Smalltalk” on Bret Victor’s site — I wrote that in 1993, and it has quite a few good stories about Parc in it.

I was surprised that “The Power of the Context” (2004) didn’t come up on the first page, but this can also be found via Google (try “alan kay the power of the context”). This one is especially useful because the references are pretty extensive, and many of them have online links.

One of the references there is Mitchell Waldrop’s “The Dream Machine”, which is by far the best history of the larger streams that created ARPA and then Parc. Lots of very good stories here, most of them at least “more or less true”. Patrick Collison tells me that Stripe Press is going to reprint this very worthy book.

Two other references that are vital — and have a few good stories in them — are the histories by Butler Lampson (Parc’s “Oppenheimer” and more), and Chuck Thacker (Parc’s hardware genius and all around genius). These are both available online.

Then I tried “xerox parc history” with Google. This was more random — e.g. ignore Malcolm Gladwell’s “imaginings” in the New Yorker — but I did find on the first page “Xerox’s Legendary Palo Alto Research Center — IEEE Spectrum”. This is a terrific piece, mostly by Tekla Perry, that takes a sharp look at the interplay of personalities etc. there.

There’s quite a bit more, but this is the way I try to answer questions using the technology that is there. It’s surprising that I can’t explain what I’m looking for to even a semi-AI yet, but just a little digging will find pretty good primary material.

Alan Kay, could you give an example of when software development has driven hardware development?

One way to think about how software helps computing is that it introduces “late-binding” into machinery. The very idea of a “changeable stored program” allows early-bound mechanisms to take on different purposes. I’m a very big fan of the Jacquard loom — not just for its ability to be driven by different programs — but for the wonderful tolerant engineering that Jacquard devised to allow it to work so well so early in the industrial revolution (ca 1800). (A lot can be learned about “how to think like a great engineer” by studying this design.)

The idea of “registers” to hold parameters is another late-bound idea (compare to the many dials that had to be set by hand in e.g. ENIAC (which also had a few registers)).

Putting a Jacquard like program in memory is a very late-bound idea — I like Turing’s formulations as an example, and then the Goldstine-von Neumann adaptation of this idea for getting modern computing going. A big problem was how to make enough volatile memory to do anything. Some of the first attempts were acoustic delay lines (using mercury), Williams tubes (using persistence of charge on CRTs), rotating drums (later), and finally the invention of core memory by Forrester for Whirlwind in the early 50s.

A great idea that has been used off and on — and should be available to programmers on more CPUs — is microcode, invented by Maurice Wilkes in the 50s to allow late-binding of how basic instructions and other low level features could be made with the hardware.

Early machines did not have index registers — they are another example of late-binding something. They did not have stacks to hold return addresses or previous contexts. They did not protect memory from random or malicious errors. Etc. They had essentially no features for the advent of higher level languages such as Algol in the late 50s and early 60s.

A machine architecture ahead of its time — and still today — was the Burroughs B5000 in 1961–2 by main designer Bob Barton — which was entirely driven by software needs. These included virtual memory and exchange of segments with a swapping drum, fine grain “capabilty” protection of everything, multiprocesses and automatic process switching — thus multiple CPUs — a comprehensive stack structure that could hold procedure activation records as well as return addresses, and also intermediate results of expression computation. Compact “byte-codes” for instructions fetched 4 at a time. Etc.

Besides all the automatic processes implied by the above paragraph, let me mention a few that might not be so obvious. A simple one is that the protected (by a tag bit) “descriptor” for any kind of object contained useful information beyond just the address: if an “array” it would have the length of the array, and this would be automatically checked in parallel against the index to be used so it was impossible to store outside of an object. Another one is that a procedure descriptor could be stored where an array descriptor was expected, and the right thing would happen for both fetches and stores (i.e. the B5000 could simulate “objects”, “data”, etc.).

The “byte-coded virtual machines” for higher level languages that have been used for many years are emulations of some of the features of the B5000, but on the B5000 they ran at top “machine speed”. At Xerox Parc, we — Chuck Thacker was the main designer — made these machines for our languages by making microcoded computers where the microcode ran many times faster than main memory could cycle (this allows pure hardware speeds for language emulation to be obtained on many occasions).

Another example can be drawn from the Parc architectures: multiple program counters with zero-overhead task switching. This was first done on the TX-2 of Wes Clark at Lincoln Labs, famous among other things for being the computer that Sketchpad was done on. The Parc Alto had 16 program counters, each pointing into the microcode. Attached to each program counter was “lookaside” logic tied to various conditions in the hardware represented by a bit, for example: a disk sector just passed, the CRT retrace pulse just happened, a two millisecond tick just happened, the ethernet card had something of interest, etc. A simple scanner looked to see which was the highest priority program counter and selected it for the next microinstruction. Thus up to 16 microcode tasks could be seamlessly interleaved using a (carefully) shared register file. This allowed most normal hardware to be eliminated — the normal controllers for display, disk, keyboards (including music keyboards), pointing devices (including the mouse), other I/O, etc. were all emulated. So was “bitblt”, the screen painting primitive. So was the Smalltalk virtual machine (and other languages’ virtual machines), etc.

(This is how we were able to make an affordable buildable personal supercomputer in 1973 — eventually almost 2000 of these were built in the 70s. And this is why the large accomplishments in software that happened at Parc were possible: the software needs drove the hardware designs very very deeply.)

Much later, the software needs for graphics rendering eventually — a terribly long time except in the Silicon Graphics machines (done by an ARPA guy and pioneered at Parc) — resulted in various GPUs.

A reversal of the progress made — but still to your question — is when a low level language — C — was taken as “popular” and hardware started to be made to cater to this. This started to hurt development of really high level languages, and continues to this day. (I should note here that having a universal intermediary language as a universal target for higher level semantics is a very good idea — and this represents something that could be assisted by hardware — the problem is that C is not very good at representing what needs to be modeled for advanced computing …)

Just to pick one of many crazy things in today’s CPUs: it’s worth looking at just how slow is a full process switch in most commercial CPUs these days. Combine this with a very poor way to think about MMUs (things are different than they were in the 60s!) and it’s hard to make hundreds of thousands to millions of protected processes. There are enough gates to do a very different architecture that would be much more safely parallel, etc. This is quite missing what is actually needed for various scales of computing.

Interesting hybrid architectures — e.g. Intel has some — are CPUs that combine a “C based” processor — like an ARM chip — with FPGA programming. Much can be accomplished here if the memory available has high enough bandwidth and capacity. Intel tends to do this poorly.

Another miss was to take a good idea — the “Harvard” two cache architecture — and make it too small to do a good emulator. How to do this well is something that readers should try to work out for themselves.

Another thing worth pondering is how to get enough memory bandwidth to multicore architectures to make them really worthwhile.

I think it’s fair to criticize the current lashups of hardware available today as being kind of randomly helping or hurting software. From the perspective of many decades, it seems more like the hardware vendors are mostly coming up with designs that make their own processes easier rather than being interested in software development.

What did Steve Jobs mean when he said that Apple is fundamentally a software company? Was he right?

My slogan in the Xerox Parc years was: “Hardware is just software that is crystallized early”. In other words, the proper business of computerists is dealing with process and processes. Large ideas will lead to software architectures, UIs, languages, and development systems. At some point the necessary but least convenient parts of a system — the hardware — will become clear.

This was a reflection on the ARPA computer research community I grew up in (and its predecessors in the SAGE project that started with Whirlwind). This community was constantly coming up with new process ideas, simulating them on existing hardware, and then designing and building new hardware that better fit what they wanted the processes to do.

One way to think about this is that the job of hardware is to optimize software (including getting it running at all). Unfortunately this was never made clear enough to Intel and Motorola — despite a lot of efforts on our part in the 70s. So we have the unfortunate backwards dance of HW vendors producing CPUs etc and getting the software people to make it look good.

This was one of several reasons that we decided at Parc to make all of our own SW and HW — and did. This allowed us to simulate the software and its environments beforehand, and then to design and make machines that would be efficient in supporting the ideas.

The CAD -> SIM -> FAB cycles are in general used in most serious engineering fields, but serious engineering in computing is still lagging (e.g. note how long it took for PC vendors to finally include a GPU, etc).

Steve didn’t understand this very well in the early days of Apple, but could see that the Parc GUI was a SW idea that forced new HW (the Lisa and Mac). As he said in a later interview, he missed that the way we programmed also required new HW (we did, Apple didn’t), and so did the way we networked things together (ditto).

I think his stint at NeXT got him to understand SW a little more, and the Pixar folks really understood it. So he had the idea more strongly when he came back to Apple (hence his use of another slogan of mine that he liked: “People who are really serious about software should make their own hardware”.)

How much percentage of code is shared between software projects, e.g., between a video game and a word processor (in terms of STL, OS, etc)? Is this percentage comparable to the percentage of genetic similarity between mammals?

One answer to this question can be gleaned by asking yourself how many different text editors you have to use in a week (for example, right now I’m using the Quora editor, I use one for email, another for when I have to chat, and others for writing papers, commenting in blogs (Quora has a different UI for their comment editor …), etc.).

They do share some properties, for example, they all use the “modeless” text editing that was invented at Xerox Parc. But e.g. the Quora, and many other editors found in the browser, is not modeless at the next level (right now I’m in an editing mode and I will have to hit the “Submit” button to even see how what I’m typing right now is going to look — this was done away with at Parc (i.e. WYSIWYG), but separate presentation modes have crept back). What can be embedded or not is different, etc. I often have to attach “really rich text” documents to emails because the email editors are weak.

And so forth.

Now we have to look at the systems aspects and about the underpinnings. If the editor were a “real object” that was viewed by a “real view” the same text/editor could be used everywhere (this was done in Smalltalk at Xerox Parc in the 1970s).

If you don’t have real-objects/real-views, and don’t have late-binding, etc., — i.e. if it is difficult to write and use real systems in one’s programming environment — it can be quite a chore to simply embed standard components and be able to use them in an integrated fashion.

It’s puzzling and annoying that this is still the case in 2018!

Did astronauts on Apollo have the means to kill themselves in case their mission failed?

I was in the Air Force in the early ‘60s, and part of the flight training then — and likely still today — was to experience various effects of high altitude in an “altitude chamber”. The one we had at James Connelly AFB in Waco Texas would hold about 25 people. One of the drills was to buddy up and have one of the pair take off their mask and copy some writing from a book. This was done at 18,000 feet, 25,000, 35,000, and 45,000 feet.

The experiences were the same at each altitude. One remembered writing, and the next thing one remembered was the buddy holding the mask over your face. The writing would be OK for a bit and then would turn into a scrawl. There was no memory of the change.

I don’t remember the intermediate times but 18,000 feet took about 2 minutes, and 45,000 feet was about 20 seconds.

The reason there was no sensation is that our bodies can’t sense low oxygen (some animals can) but we sense only high carbon dioxide (which makes us gasp). But under high altitude conditions there is no gasping because the partial pressure of carbon dioxide allows it to continue to exchange out of our lungs. Meanwhile there is not enough oxygen entering.

The purpose of this was to scare the shit out of us so we would constantly check to see if oxygen was flowing while we were flying. (It worked!)

The other memorable experience with the altitude chamber was explosive decompression. This was done four people at a time in the airlock to the larger chamber. They’d take the large chamber up to 45,000 or higher and then suddenly vent the airlock to it. One of the warnings was to have your hands in front of your face to catch your mask as the air in your lungs would blow it off (despite the straps). Another was to have a bowel movement before doing this! One of the jokes was that you could tell the highest rank that had been in the airlock by how bad the smell was.

I wonder now whether they still do this in quite the same way as in 1962. Both of the exercises are moderately dangerous: they let us pass out four times from lack of oxygen, and the explosive decompression could badly damage one’s inner ears, etc. Another exercise that they might not do these days is the ejection seat simulator, which is quite a whack (some ejection seats could pull enough Gs to compress one’s spine, etc.).

As to the question, regardless of how much astronauts might strive to fix the situation, they would have all gone through these experiences in the ‘60s and would know that just lowering the partial pressure of the air they were breathing would accomplish a completely sensation-free send-off.

What’s the best book about Charles Babbage in terms of technical detail?

I would also vote for the Swade book.

In a computer design course I taught many decades ago, I asked the class to design the computer that Captain Nemo would have had on the Nautilus (ca 1865).

A few students tried the Babbage route with mechanics. Most went for electrical schemes involving relays (Morse’s telegraph was ca 1845). As far as I can recall, no one thought about pneumatics and “fluid logic”. This technology also started developing around the 1840s e.g, with the “Barker lever” force amplifier for large pipe organs.

It’s worth pondering what each of these technologies brings to the table, and also their difficulties.

I’ll just point out here that a big problem with most mechanical schemes is to do what is called “restoring logic” so you don’t have to have the signal also carry the propagating power needed. I once got a chance to actually crank the London Science Museum Difference Engine, and it just barely works, it is at the limit of what you can drive mechanically.

Danny Hillis told me that after he and Brian Silverman did the TinkerToy computer to play TicTacToe — which had a hard time working — they realized that they should have tried to find some way to do mechanical restoring logic — it was just too hard to drive signals (in old computer logic terms, the “fan out” was much too shallow).

By contrast, the Jacquard looms do not have to do a lot of work. The sensing and warp lifting is done by pressure against restoring springs (of the “needles”) and lifting weights (on the warp threads).

The pressure needed to apply the “cylinder” (a rectangular solid with holes in each of the faces) with a card on top against the pins that are connected to the springs is minimal. The main work is to move the “cylinder” back while advancing a card. This can easily be done by a single person or by an engine.

If you look closely at how a Jacquard loom works*, you will be struck by how few complications he wound up with (one of many wonderful principles was to use flexible yarn to route pull-up forces rather than to try to use more normal mechanical means, e.g. the “trackers, stickers, squares, roller-boards, etc.” of the pipe organs of the day). The mechanisms were almost flimsy — not really, perhaps: “no more than absolutely needed” — but were also very easy to replace and fix. One of the main principles everywhere is “large tolerances” with “registrations when needed” — this works extremely well!

If we compare the two designs, the Jacquard loom is “light, airy, easy, graceful, flexible, tolerant, beautiful, akido, fixable, etc.” — and works! — and the Babbage mechanisms of the Difference Engine were quite the opposite in every regard. (A wonderful charette for engineering design students!)

Babbage correctly saw that the Jacquard punched cards were a great idea, but he wanted to apply these to the mechanics of the Difference Engine. Almost certainly, what he should have done is to start with the Jacquard idea and think the whole rest of his engine through again from scratch (there are some interesting implications in the “logic” that Jacquard invented to make simple decisions, and of his highly tolerant way to do the mechanics … ).

Ada’s famous “Notes” on her translation of a paper about the Analytical Engine were written in 1843, right at the time when there were more technologies that could have been considered …

Relays give you a way to do restoring logic when you need it, but if you take a look at Aiken’s Mark I at Harvard, or Stibitz’s complex number calculator, also done with relays, it is not a good route to making a computer.

Fluidics on the other hand can be done by etching glass plates. The gates can be small, it is easy to do restoring logic, it’s faster than relays, etc. Easy to do a display, etc. I.e. Wow!

---

I realized belatedly that I should give as reference the excellent book from which I learned the details of the Jacquard loom: (1888) The Jacquard Machine Analyzed and Explained, by E. A. Posselt. Thanks to the wonderful Brewster Kahle and his Archive.org, this book can be found and downloaded via Google (it was a lot harder for me to run down this book years ago …).

What were the best lessons learned at PARC’s Learning Research Group?

Too big a question for Quora-sized answers (I would have to explain what we did and how we did it, and the last time I did that it turned into a 50 page history for the ACM).

But one thing I did learn that doesn’t require the historical knowledge to understand is that the way to “let the completers complete” and to keep working on new ideas is to form new groups as needed.

Most ideas are mediocre down to bad, but the few that are better than that are still easier to have than to do serious software engineering to make the previous ideas real. Many ideas arise from seeing the old ones more clearly and this often happens during development.

I made the mistake at Parc of trying to get my research group to switch to new ideas too often and didn’t deeply understand enough that as “completers” they really wanted to complete (and it is very good that they did over and over because it was the goodness of the actually working stuff that made the most impact on the rest of the world).

Even so, this stellar research group did 5 different Smalltalk systems — all working — in a little less than 10 years, one following the next every two years.

The way to solve the problem of ideas in the theater is to just do a new play with a new company, and you wind up with several plays that are worth pondering that can cause further ideas to flow. Later I realized that this path could have been taken in our group.

Can someone link me to the original white paper discussing what was eventually coined OOP? I believe it was by Alan Kay, but I'm not able to find it simply in a Google search. I'm finding it referenced, but not linked to, and I don't recall the name.

Depends on “which OOP” you mean. The kind of language that C++ is, was first created in Simula I, in the mid-60s, and Nygaard and Dahl wrote a paper about it in the CACM in 1966. They reference an earlier document they wrote in Norway in 1965.

The idea of associating procedures and data together goes further back, and has appeared independently several times. You can read about some of these schemes in my “Early History Of Smalltalk”.

I saw some of these when I started programming in the Air Force ca 1961, but I was particularly struck by Ivan Sutherland’s Sketchpad system (his thesis was MIT, 1963, I read it in Nov 1966). It was like magic. A week later I read the Simula documentation, including the paper above, and realized that Simula had a similar structure to Sketchpad, wasn’t as nice, but did have the open-ended feature that you could write programs — even kludges — to make things happen.

What I called “OOP” came from the realization that (a) a “real object” should be semantically equivalent to a whole computer, (b) that this would be perfectly universal, (c) that if messaging were used — as was contemplated in the ARPAnet — enormous scaling was possible, (d) that everything should be an object (for many reasons), and so on. Besides many analogies to networks of computers, and processes within time-sharing systems, my background in biology got me thinking about -real scaling- (far beyond the means of computing at that time), and my background in math got me thinking about “algebras” of communication.

This wound up with an image of something rather different from Simula, and created quite a few years of work on both designing HW-SW to run something this universal, fast, and small enough.

How would Apple have benefited if Steve Jobs were technical?

This is a very complicated question to answer in a fair and comprehensive way. Let me apologize in advance for its length.

Some context: Steve and I were friendly from after the Parc visit until he died. I think that he can’t be too highly praised for many of his attributes, especially courage, determination, communicative abilities, and will.

First, we have to ask a question about “technical”. Technical as in being able to fix a computer, technical as in Woz, or technical as in Chuck Thacker? Or as in an expert in computer products, or as in a skilled programmer today, or as in Dan Ingalls or Butler Lampson?

image

Here’s Steve in an interview in 1995, years after the visit to Xerox Parc and then the Lisa and Mac, talking about the visit to Parc:

(This URL seems to work in both the US and the UK, the segment I’m referring to starts around 6:35. Perhaps someone can fix this link if it doesn’t.)

We can see that what appealed to him was what he could see — basically the bit-map screen and the Smalltalk GUI — and the other really important systems aspects — such as networking and how and why we did it, and advanced programming and how we did it — made no impression.

(Note also that he catches himself praising Parc too highly and starts downgrading his enthusiasm. To get an idea of what he saw, take a look at this video tribute I did for Ted Nelson a few years ago using the genre of Smalltalk that Steve saw in 1979: a rescued Smalltalk from 1978 that we brought back to life ca 2014 — the main difference here and what he saw is that more memory today allowed more bit-map pictures to be used — but he did see the painting system and that exact elf, that graphic system, the multiple desktops, etc..)

Years later at NeXT, he was working with people “more technical” than most of the people at Apple in the late 70s and early 80s — people who knew a lot more about both hardware and software, about operating systems and objects, etc. The result was a more technically sound product, many of whose fruits were brought back to Apple when Steve returned.

So Steve wasn’t “technical”.

However, a person even more important to personal computing than Steve was Bob Taylor — one of the four key ARPA funders, and the person who set up Xerox Parc. Bob was also not technical. He was a psychologist who had the wonderful disease of wanting interactive computing for everyone, the same disease that Steve later contracted. Bob was also charismatic and a person who made things happen.

The biggest difference between them was that Bob put a lot more care into deciding what technical people to trust with the problem finding and the ways to make progress. This was partly because of Bob’s vantage point of being a researcher at NASA, and then funding people like Doug Engelbart and Ivan Sutherland at ARPA.

The result of Steve being Steve was that some not good technical decisions were made by people he trusted — and some by him — mixed with some better technical decisions — and then he was able to get many of the conglomerations successfully to market (the latter should never be discounted — it is really difficult, and Steve can’t be too highly praised for getting the stuff into manufacturing and the marketplace).

But in the end being able to replicate an idea is one thing, getting the worthwhile ideas to manifest in the first place is quite another — both require super high quality people — and rather different people.

Another complication that needs to be mentioned for context is that the start of the 70s and the start of the 80s were two completely different times and situations. No one else was working on powerful personal computers in 1970, but in 1980, there were already quite a few 8-bit micro hobby computers — including the Apple II — quite enough to get IBM interested enough to enter the market in a big way in 1982.

We knew in the late 60s pretty much how much computing power was going to be needed to pull off a “real personal computer” with a generally useful screen and GUI, and that it would need at least a 16-bit processor, etc. Moore’s Law calculations said “by the end of the 70s or into the 80s”. The Parc Alto was made to “compute like a decent personal computer of the mid-80s” but starting in the early 70s. This gave us a head-start on the much harder design and software system building problems required for a real personal computer. Most people don’t realize that the Alto and Smalltalk had been running for more than 6 years when Steve finally saw them for the first time. Nor do they realize that there were thousands of Altos by then, connected by Ethernet, with file servers, page a second laser printers, a Parc “Internet”, etc. Steve was shown the whole suite of things needed for real personal computing.

One of the key factors of the 80s to the present day is the combination of competition and a generally unsophisticated market. This means “early to market” is a big deal (because something really good down the line won’t be valued enough to displace the poorer early movers). This makes it really difficult to work on “important good things that take time”. The lure of “sell now” tends to kill off “what is actually needed”. This happened in most areas of hardware and software, and created poor de facto standards — and “redefined normal downwards” — to make it difficult to both do good things, or even for most people to have the perspective to criticize them.

Marketing people would generally prefer a product not to have a learning curve, so they would much rather cater to what people think they want rather that what they need (what could actually not just really help them, but help to qualitatively change the ways and power of their thinking). This leads to “consumer marketing and products” and something like a “pop-culture”.

Sic transit gloria mundi.

What did Alan Kay and Steve Jobs talk about at the 2007 iPhone keynote?

Steve and I first met during his visit to Xerox Parc in 1979 at the urging of some of his computer people to see what we’d been doing for the previous 8 years. You can get an idea of this by taking a look at a revived Smalltalk system from the year before (1978) that several of us brought back to life a few years ago after its 40 year slumber on a rescued disk pack that Xerox had thrown away.

I used this to make a tribute for a Ted Nelson celebration. The main difference today is “more pictures” (we didn’t have enough memory back then to use more than a few). We showed Steve many of the demos that are on the Ted Nelson tribute video, plus a few others not on the video.

This subsequently formed the basis of the Lisa and the Mac a few years later (the Mac was done in 1984, 11 years after the more powerful and capable Parc Alto, which started working in 1973).

Later, I was chief scientist of Atari for a few years (81–84), and Steve and I would periodically have lunch. The last year of Atari was a collapse and I eventually accepted Steve’s invitation to come to Apple.

In 1984, Time or Newsweek asked me my opinion of the Macintosh, and I said “The Mac is the first personal computer good enough to be criticized”. Internally at Apple, my first memo had the title “Have I got a deal for you: a Honda with a one-quart gas tank!”. Steve did not like this memo, but what could he do given the history, and that it was quite true?

Steve and I remained friends (I was the go-between that brought him together with the people who were to become Pixar).

I think he invited me to the 2007 iPhone unveiling partly because it was kind of a tiny “Dynabook” — and he had always wanted to do one — and partly because he was going to use a quote of mine that he had always taken to heart “People who are really serious about software should make their own hardware”.

The photo of us chatting:

image

was taken right after the event. He brought the iPhone to me, put it in my hands, and asked: “Alan, is this good enough to be criticized?”. My reply was to make a shape with my hands the size of an iPad: “Steve, make it this size and you’ll rule the world”.

When the iPhone had been revealed a few minutes earlier I realized that they must already have done an iPad/Dynabook-like machine (easier) and that the “iPhone first” must have been a marketing/timing decision.

If programming languages are written using programming languages, how was the first language written?

Another incomplete offering …

It’s worth looking at “deep bootstrapping” because it helps gain experience with “meta leverage thoughts”. For example, what if there is only one computer with only machine code? Here you have to use the bit-patterns that correspond to machine-code instructions. What if you are both lazy and ambitious at the same time? I.e. you don’t want to muck around in the bits for any longer than you have to, and you also want to make a high level programming language as quickly as possible?

One way to think about this is that you want to make a few really powerful “bricks” without a lot of hand-coding that will then allow bootstrapping something that will bootstrap your desired language (and maybe you’ll want one more bootstrapping phase in there).

One of the many fun and profound things about Lisp is that it only has a few primitive “functions” — car, cdr, cons, atom, equal, lambda, read, etc. — and most of these are extremely simple and small in any typical machine code. One route would be to code these by hand and get a minimal but universal Lisp going to then use as the basis for the next levels of HLLs.

Another approach is to make a very simple byte code interpreter — with byte code implementations on the lines of the Lisp functions — and then use a technique like that of Meta II (Val Shorre in 1964) to install a simple but powerful metacompiler written in itself that can bring up further versions of itself and of other HLLs (complete examples are given in the very neat paper from 54 years ago).

BCPL (Martin Richards) was a transportable systems programming language used in the 60s (it was the ancestor of “B” and “C” and “Bliss” etc). It was written in itself in terms of byte-codes, and as above, one wrote the byte-code machine by hand, brought up BCPL, which then had the tools to both improve itself (write a compiler for BCPL to the new machine) and to implement other languages and systems.

The first Smalltalk (-72) was written as a one-page interpreter similar to Lisp except the equivalent of “eval/apply” was sending a message to an object as a pointer to the message stream (kind of a more general FEXPR, and the object could use Meta II-like techniques to parse the message on the fly. Getting this one pager going jumped all the way to a VHLL with dynamically extensible syntax.

From the 60s on we have had more than one computer at our disposal, so we can think of ways to bootstrap with more tools at our disposal. People write cross compilers, but I advocate that to be the last thing done. You are much better off to bootstrap various interpreters in the early part of the game.

An interesting wrinkle is to write an interpreter of the target machine on our development machine. This makes a lot of sense, because then much can be done in the arms of our development tools before having to drop bits on the target.

I had the fun of doing this ca 1965 when I was programming at the National Center for Atmospheric Research. The head systems programmers didn’t trust Control Data so they were determined to do their own systems software for the not quite released 6600 supercomputer, and they asked me to write a simulator for the 6600 — on our existing CDC 3600 — so it could be debugged. The 6600 was quite radical compared to most HW architectures, and it was quite fun to make something that would emulate the machine speedily enough (this was written in 3600 assembly code to get maximum speed).

This turned out to be prescient because soon after this there was a crisis announcement of “no code from CDC” for the 6600 supercomputers. Those who were going to get these machines — Livermore, National Center for Atmospheric Research, Sandia, Los Alamos, etc. — pooled what they had done outside of CDC to see what they could come up with. We all wound up moving to Chippewa Falls Wisconsin (Seymour Cray’s home town, and where he had his lab) for a good part of a year to piece together a software suite that would allow the machine to be used.

The 6600 simulator I had written was pressed into service again — I rewrote it in 6600 assembly code (using the NCAR assembler that had been debugged on my 3600 emulator) in order to make a debugger for the 6600 (Cray had neglected to put interrupts on the machine and it was very hard to control). This simulator used the machine itself as much as possible to retain speed (which was intertwined with how the main memory cycled — the 6600 could run 10 times the speed of main memory).

Bottom line here is that emulating the target HW seems like a lot of work when bootstrapping, but it allows much better development and debugging, so what is eventually bootstrapped is much more of a self-sufficient system.

Similarly, it is always useful to make a simulation of the language even when it is to be compiled. This helped over and over in the experience with Smalltalk and its bootstrapping adventures (e.g. how Squeak was made and deployed).

What is this "live objects" in Smalltalk? I've gotten used to that "edit-compile-test-debug" cycle, and want to understand the philosophy behind Smalltalk (Pharo).

John Thorpe’s answer is a good start. Now extend the idea to the entire system and think of what we used to call “objects” (now we have to call them “real objects” since “objects” were co-opted in the early 80s) as software computers semantically equivalent to computers on a network like the physical computers on the Internet (these ideas were invented in the same time frame by the same research community).

In both cases we have active systems that not only provide services to users, but which also have to provide services to themselves (maintenance, improvements, etc.) while continuing to run.

In the case of Smalltalk, the idea was that some of the active objects would be the interactive development “kit”, some of the active objects would be code, some would be translators to the VM, some would help debug, and some would deal with errors in non-destructive ways.

So most Smalltalks have been up and running for many years providing services, doing development, etc and have never been down, have never gone through a typical edit-compile-load-debug cycle, etc. The equivalent of the latter is done while the system is running, and the changes take about 1/4 second to take effect.

This is extreme late-binding. In Squeak Smalltalk for example it is theoretically possible to make a change in the deep system kernel and replace the old one while the system is still running (I don’t know of anyone actually doing this).

There were many contributors to the design and integrity of this scheme, but the main credit should go to Dan Ingalls (I just “did the math”).

For completeness, I should remind that we are talking about a more than 40 year old system, and quite a bit more along these lines could be done today. Here is a revived Smalltalk from 1978 that I used in a tribute to Ted Nelson.

Why did Alan Kay once say GNU/Linux was a budget of bad ideas? What is generally wrong with GNU/Linux, and Unix-like OSes?

I should learn to keep my big mouth shut (not the first time I’ve had this thought!).

The reason is that it is easy and OK to praise something in a sentence, but I think that a complaint should be backed up with careful discourse. Part of the problem here is that presentations that are “oral” are now being recorded and preserved. I would not have bothered to complain in print about operating systems because I really wouldn’t like to expend the time to explain all the things that need to be explained to back up such a complaint.

And Quora is not a right-sized forum either. So let me try some meta observations, and we can see if the readers can turn these into balanced observations.

---

This section added Feb 11th 2022

To be balanced, I should have included some of the “bad ideas” we put into Smalltalk in the 70s: for example, using “Lisp-pointing” to relate everything: we needed late-binding, but not the dependencies! Etc.

I didn’t say much about scaling difficulties, including that even good ideas usually have half-lives that are much shorter than ours. (So the Lisp hacks really worked well for us in the small and earlies. Etc.

---

One of the good ideas in the original Unix was to minimize the privileged kernel to the most important allocation and protection of time, space, and communications, and to have everything else be done in standard Unix processes. Another very good idea was Doug McIlroy’s insight of “pipes programming” to allow modules to intercommunicate via messages and interpreters in each process. Skipping forward to my favorite version of Unix, we have Gerry Popek’s “Locus” architecture in the 1980s, which could migrate processes over networks, balance loads, and have them run seamlessly on different hardware types.

Linus’ original kernel for GNU had much the same simplicity.

The meta views can start with thinking about the Internet as we have it today, and what kind of machinery is needed to allow it to be used as a computing resource. There is TCP/IP, which disappears the lower level machine idiosyncrasies. An idea contemporaneous with Locus was Berkeley Sockets that smoothed some more things out but missed the bigger ideas in Locus. The much bigger and better idea about how to go about things — from Dave Reed’s 1978 MIT PhD thesis — got missed.

Quite a few things in computing get done for immediate reasons and conditions and then neither scale nor are redone. (The “willy nilly” house metaphor used in one of the other comments is apt.) It’s worth pondering just how much early binding (trying to save one extra indirection) has cost in systems building over the years by making dynamic reconfiguring difficult to intractable. The HW manufacturers have not helped by making interprocess switching and communication so slow — and so few.

C could be better, but since it doesn’t have to be coded in directly but can be a target for code generators of many kinds, we can see that the big problems in and around C-like schemes for programming and OS building are those of “linking and loading and binding”.

So could we imagine an Internet where each hardware has a very simple SW kernel to allocate its resources, and the rest of what’s needed for any application — processes — can be dynamically loaded (and cached when this is a good idea)?

The basic meta idea here is that the more static are dependencies, the more difficult it is to reformulate a system as both resources and requirements change. And (to take one of Reed’s points) the more that the definition of “time” is taken from the CPUs and simulated by the systems themselves, the more coherently scaling can be done.

This is not an extensive enough answer, so I apologize in advance. I’ll probably return to it a few times if I can think of better ways to help visualize the actual problems.

Why is functional programming seen as the opposite of OOP rather than an addition to it?

I hope for all our sakes that I can make this short …

In the latter part of the 50s John McCarthy got more and more interested in what he started to call “Artificial Intelligence”. He was also doing some consulting and this brought him in contact with the SAGE air defense system: large systems of very large computers attached to radar stations and each other and usable by graphical display systems with pointing devices.

image

John’s reaction was “Every home in America will have one of these”. He could see that the networked computers could be thought of as an “Information Utility” (as a parallel to the existing utilities for electricity, water, gas, etc…) and that the terminals in the homes could provide many kinds of “information services”. Among other things, this got him to advocate that MIT etc do “time-sharing” of their large mainframes …

He also realized that the computer milieu of the 50s — machine code and the new Fortran — did not intersect well with “most people in US homes”. This got him to write a paper in 1958 — “Programs With Common Sense” — and to suggest that what was needed for the user interface was an active semi-intelligent agent — the “Advice Taker” — that could interact with users in their commonsense terms, could learn from “taking advice”, could problem solve on behalf of the user and itself, and so forth (MIT AI Memo 17).

This got him thinking about how to implement such an Advice Taker, whose main mechanisms would be various kinds of logical deductions including those that required actions. There wasn’t much to go on back then but a few gestures at “list processing”, so he decided to invent a language that could be used to make the Advice Taker (and other kinds of robots), and more generally allow symbolic computation to take its place alongside the existing numerical computation.

John was an excellent mathematician and logician, and so he also wanted to come up with “A Mathematical Theory of Computation” to put ideas old and new on a firmer basis.

His result was LISP (for “LISt Processing”). I have written elsewhere about its significance.

Meanwhile, he was pondering just what kind of logic, math, and programming (he thought of these as highly intertwined) could be used to deal with a robot in the real world.

<eliminating detail here> A conflict was between at (robot, philadelphia) and at (robot, new york) which could not happen simultaneously, but could happen “over time”. This was like the problem of contemporary programming where variables would be overridden (and sometimes even files) — basically, letting the CPU of the computer determine “time”.

This destructive processing both allows race conditions and also makes reasoning difficult. John started thinking about modal logics, but then realized that simply keeping histories of changes and indexing them with a “pseudo-time” when a “fact” was asserted to hold, could allow functional and logical reasoning and processing. He termed “situations” all the “facts” that held at a particular time — a kind of a “layer” that cuts through the world lines of the histories. cf McCarthy “Situations, Actions, and Causal Laws” Stanford, 1963 prompted by Marvin Minsky for “Symbolic Information Processing”.

One of the ways of looking at this scheme is that “logical time” was simply to be included in the simulations, and that “CPU time” would not figure into any computation.

<more detail excluded here> This idea did not die, but it didn’t make it into the standard computing fads of that day, or even today. The dominant fad was to let the CPU run wild and try to protect with semaphores, etc. (These have the problem of system lockup, etc., but this weak style still is dominant.)

Systems that have used part or all of John’s insight include Strachey’s CPL, Lucid, Simula, etc. Look at Dave Jefferson’s TimeWarp schemes, Reed’s NetOS, Lamport’s Paxos, the Croquet system, etc.

To just pick just one of these, Strachey in the early 60s realized that tail recursion in Lisp was tantamount to “a loop with single simultaneous ‘functional assignment’ ”. And that writing it this way would be much clearer by bringing the computation of the next values for the variables together.

There are no race conditions possible because the right hand side of the assignments are all computed using old values of the variables, and the assignment itself is done to furnish new values for the variables all at once. (Looping and assignment can be clean if separate “time zones” are maintained, etc.)

More main stream is that big data systems used versions instead of overwriting, and “atomic transactions” to avoid race conditions.

Back to McCarthy and — now — objects. One of the things we realized at Parc was that it would be a very good idea to implement as much of John’s “situations” and “fluents” as possible, even if the histories were not kept very long.

For example, this would allow “real objects” to be world-lines of their stable states and they could get to their next stable state in a completely functional manner. In the Strachey sense, they would be “viewing themselves” with no race conditions to get their next version.

This would also be good for the multiple viewing we were starting to use. You really only want views to be allowed on stable objects (/relationships) and this can be done by restricting viewing to already computed “situational layers”.

Parc was also experimenting with “UNDO” and the larger community was starting to look at “parallel possible worlds reasoning”.

The acts of programming itself also wanted to be in terms of “histories and versions” and systems should be able to be rolled back to previous versions (including “values”, not just code). cf Interlisp, and especially the PIE system (done in Smalltalk by Goldstein and Bobrow).

This was another motivation for “deep John” in future systems. I.e. do everything in terms of world-lines and “simulated time”. A recent paper by Alex Warth shows some ways that “Worlds” can be quite fine-grained. http://www.vpri.org/pdf/tr2011001_final_worlds.pdf

The last point here is that “Histories R US”. I.e. we need both progression in time for most of our ideas and rememberings and we also want to reason clearly about how every detail was arrived at (and to advance the system).

John McCarthy showed us how to do this 60 years ago this year and wrote it down for everyone to read and understand.

So: both OOP and functional computation can be completely compatible (and should be!). There is no reason to munge state in objects, and there is no reason to invent “monads” in FP. We just have to realize that “computers are simulators” and figure out what to simulate.

I will be giving a talk on these ideas in July in Amsterdam (at the “CurryOn” conference).

Why is engineering one of the hardest fields to be creative? What should engineering education look like instead?

I don’t think that it is intrinsically hard to be creative in engineering.

An important part of engineering is that “the bridge must not fall”, “people must not get hurt”, etc. So people “called”* to engineering always have this at the back of their minds.

I used to utter the quip “Don’t ever let scientists and mathematicians design your airplane. (Because they are even happier when it fails! — they might learn something!)”

These considerations cast a conservative tinge over most engineering. However, because of the great advances in CAD<->SIM, it is now possible for considerably wilder safe experimentation to be done with design ideas. The deep concerns about safety are embedded into the fidelity and comprehensiveness of the simulations, and this allows the designers to be “more creative”.

---

it really is a “calling”, as to medicine, science, religion, etc.

Do you recommend receiving a bachelors, masters or doctorate in computer science?

I recommend the lifelong process of becoming ever more educated, with particular attention to getting above important thresholds of perspective and knowledge.

Because we can’t take “education pills”, we have to work to get our brain/minds to learn and understand. This means that “becoming educated” is centrally autodidactic. Most highly educated autodidacts I know used universities as part of their process. This is because universities can be extremely fruitful places to discover the existence of many ideas, and to encounter many perspectives — universities are great complementary processes to individual reading and exploring. They also can help greatly in giving a sense of the thresholds that need to be reached for various stages of “real education”.

Computing is new enough to still be tricky, it is now lucrative enough to entice, and is seen as enough of a need by the commercial sector for it to heavily influence what universities (and hence what high schools) present as “computing”.

Just taking the first of these three points, I have always recommended that students not major in computing while undergrads. Both the newness and the “mostly making things” tendencies argue for a lot of context to be able to deal with ideas, fads, methods, novelty, “pop culture”, etc. It will help more to pick one or two majors in “really hard things” — I’d put “real science” first (which will require “real math”), and then “real engineering” (mechanical, civil, electrical, bio, etc.).

Under this kind of plan, there could be a terrific “serious minor” in various aspects of computing and design. The “real major” subjects both provide analogies and great BS detectors for the fashions in a design centered field.

If we look at the other two points from paragraph 1 above, then “enticing for jobs” and commercial pull is really going to muddy the whole idea of “education” because it tempts every faction into “vocational training” instead of “life educating”. (I happen to like “training” but it is a very different ballgame, and some self-discipline has to be exerted to make both work.)

These problems have been going on for enough decades to have seriously compromised the actual reasons for universities to the point where many academics can’t see the level of pernicious problems that are close to wiping out fields that overlap with money making.

Two questions you (all) should ask deeply and seriously are “What if the word “science” was really taken seriously in “computer science”?” and “What if the word “engineering” was really taken seriously in “software engineering”? Careful answers to these will reveal the extent of the “Emperor’s New Clothes”.

To me, grad school is primarily about being around “interesting influences” — could be people, projects, things, etc. This is to some extent true for undergrad, but grad school is all about these. Unfortunately, many of the same complaints about computing in universities also apply even to today’s best grad schools, including even what they decide to teach and research.

(Recently I was at one of these absolutely best places, found they needed to make advances in “strong AI” (aka “real AI”) and asked why don’t you just do it? They said (astoundingly) that their sponsors wouldn’t fund it. I said but what about funding from the university? They said that doesn’t happen any more… This is not just outrageous, it is enormously dangerous.)

There’s more that could be said about the problems of doing “real research” anywhere these days, but let’s leave it for now. The main point to me is that (a) you still need to go where the interesting things are (b) you need to do a lot of work to understand good meanings for “interesting” before you start looking around, and (c) you could have a lot of difficulty doing “real research” even in a good school these days. (One definition is “You are only doing research when you can still change your mind” …)

When Ivan Sutherland was asked “What is a thesis?” he said “Something 3 people will sign”. So, to me, the most important act in grad school is choosing your eventual signers. They have to let you do what you need to do, but they also have to help keep you from being trivial and facile.

And, yes, you should always finish your degrees. You can think of many reasons why. A deep one is that it is very good thing to learn how to work on extended hard projects, how to hang in there, and how to finally “stop doing real research” enough to not change your mind any more to get something done. The combination of multiple perspectives and being able to do something with them is not a bad way to think about “getting educated”.

What do you think of the statement that "an object-oriented approach to application development makes programs more intuitive to design, faster to develop, more amenable to modification, and easier to understand"?

An all encompassing answer from before objects: “You can write COBOL in any language”. (For COBOL, you can substitute Fortran, machine code, spaghetti, etc.)

I quite agree with the other (3 at this point) answers.

This is because clean and elegant software development depends on clean and elegant designers and builders. (In the early 60s, there were some quite beautiful apps done in macro-assemblers.)

“Clean-and-elegant” usually leads to thoughts about more suitable tools and general outcomes, and these intertwine with dealing with some of the problems of designing anything. My 1966 insight that since computers were universal, you could use computers sending and receiving messages as semantic building blocks for everything, was my start on thinking about systems organizations and methodology that might help programming. The encapsulation would prevent interiors from being compromised or discovered, and the computerness would allow the interiors of objects to also be decomposed into networks of computers.

This was initially a set of “math ideas”, and there were definite software engineering problems to making such ideas practical. This took some years to address by a number of us.

The wonderful “Whole Earth Catalog” appeared while these ideas were being pondered, the editor, former biologist Stewart Brand, was particularly interested in “Whole Systems”, and the catalog listed many of the most important books about systems, design, and systems design.

I wish I could rhapsodize about all of the best ones here, but for our current purpose let me single out Christopher Alexander’s “Notes on a Synthesis of Form” (his 1964 PhD thesis* (architecture) from Harvard just a few years before, turned into a book, now a classic).

Alexander got interested in dealing with the complexity and mutual interference of myriad “requirements” in large scale design (e.g. such as a whole new village in India). This led to a search for clustering approaches that would reduce the overall spaghetti problems of a large system. He used the Socratic/Platonic dictum that “You should divide Nature at her joints as a good carver would” as a mantra, and tried to come up with ways that could identify the joints (including a Fortran program he wrote that could look for clusterings that would have minimal communications outwards).

If there was a module tool, then good clusterings could be put inside modules recursively, and the result would be a set of modules within modules with minimum external dependencies. This would induce “simplicity” along several important axes, etc.

We can see that this isn’t quite the same as having the modules wind up being “natural” and “intuitive”, but the “Nature at her joints” idea will guarantee some coherence with intuition.

These kind of design approaches were critical for the Xerox Parc work because the hardware we could build was so tiny relative to what we wanted the software to be able to do. In practice we kept doing the software — lots of inventions needed for personal computing — and every two years doing a new development system and language — the software would suggest better abstractions in which it could be represented. The practical progress made here was primarily because we had a real genius — Dan Ingalls — who could deal with enough of my ideas for abstractions and could ignore them when needed, but could make complete practical systems — many quite beautiful — and do them again and again.

His results (with the help of the whole Learning Research Group) were good exemplars as answers to your question. Rephrasing: I’d say that the better you get at design, the more late-bound versions of what you mention above will help you to achieve your goals.**

——————

Alexander later repudiated this approach, as he had changed his mind about architectural design methods, but the book is quite a gem of thoughts and ways to approach complexity.

* That being said, I think one of the biggest problems yet today — we left large parts unanswered — are the many kinds of dependencies *— that pollute attempts at good systems design. I think that some of the good ways to deal with these will still involve modules, but a rather different approach to systems organization.

What is the difference between Alan Kay's definition of OOP and Carl Hewitt's Actor Model?

Not a lot of difference. You can find the history I wrote about all this online in “The Early History Of Smalltalk”.

Basically, I had the “dynamic object as whole computer” idea in Nov 66, highly influenced and catalyzed by Sketchpad and Simula (especially the former), the initial designs for the ARPAnet, current ideas about processes as virtual machines in time-sharing and multiprocessing systems, and analogies from my math and biology backgrounds.

I could see that the extensible language ideas of Irons could also be a way to receive messages — and extend the syntax — in a system of dynamic objects. I poked at this in my thesis.

I met Seymour Papert in 1968 and this changed my outlook on personal computing, led to the Dynabook idea, and got me to start thinking about what children should be doing to learn “powerful ideas”, and especially with the medium of computing.

I realized I should finish my thesis, and put off the Dynabook for the next project.

But I found Carl Hewitt’s PLANNER language at MIT — a very important precursor (and superset) to Prolog. This again almost did me in because it was clear that it would be wonderful to program “kind of like this”.

At Parc, with thesis work in the past, I started to think about a children’s language in which they could do Logo-like things, Object-like things, and also PLANNER-like things; for example monkey-and-banana and block stacking kinds of problems.

This led to the design of Smalltalk-71, which got interrupted by “a bet, and a hack” that immediately produced Smalltalk-72, an object language described in one-page, McCarthy-style, whose objects were kind of like closures that could parse their inputs. I gave a talk a MIT in the Fall of 72 to share these ideas, and the first Actors paper adapting some of these ideas appeared the next year.

Subsequently, the Actors work remained more true to the original object ideas than our Smalltalk work at Parc. This is partly because our aim was to “invent practical personal computing on a personal computer” and to use the object oriented languages for this purpose, we did a major rewrite every two years, with a genre change every 4 years, to do what was necessary to get a real system with UI and lots of capabilities going and deployed.

A few years ago we brought back to life — from a rescued disk pack that Xerox had trashed — the Smalltalk we did in 1978 for the portable Notetaker computer, and I then used it to do a tribute for Ted Nelson. Worth peeking at. This system was the one that Steve Jobs saw the next year in his famous visit in 1979.

Actors research produced some very important results. (And of course, the Parc work had a large impact generally.) But the next steps for PLANNER like languages, and especially for combining PLANNER ideas with object ideas, have had less lift and success. These are still good ideas, and people should start thinking about them again.

When did the CAD->SIM->FAB engineering process become formalized?

I’m not sure it has, but I think anyone really interested in engineering should find out how things were done before digital computers, and then before analog computers.

There was still a lot of “D” that resulted in models, and those models could be tested/simulated in a variety of interesting ways. Take a look at bridges, ships, airplanes, buildings, etc.

One of the trickiest things was to be able to deal with scaling in realistic ways. What does a wind tunnel really tell you from a model. What does a model ship really do in waves? And so forth. The “simulations” were much less accurate than can be done now, but the key ideas go back into antiquity.

The “FAB” part has always been even more difficult, so testing on the artifact was generally only done when it was as easy to make as a model.

Why is object-oriented programming more about messaging than objects?

I’ll give a short answer here and then return late for more.

If you are sending messages to “something”, then what kinds of messages do you want to send? For example: are you commanding, are you requesting, are you suggesting, are you negotiating?

If you can send messages to “something”, then you are not tightly bound to them forever, you can change your mind, and they can change also.

If you can send messages, do you really want to try to send to specific targets? It could be better to send out requests by need, and also also tell the system that you can supply x, y, and z to the commonweal.

Is it possible to learn any programming language by doing reverse engineering?

I’m not completely sure of your exact meaning for “reverse engineering”. But at the University of Utah ARPA project the newest grad student was given the toughest task no one there wanted to do.

The one given to me when I arrived in 1966 was to “get Algol going on the university mainframe” (a batch processing Univac 1108). There were several large mag tapes, two thick listings of machine code, and a little documentation on the system (the latter was pretty much incomprehensible because it was written in Norwegian that had been transliterated into English (simple sentences worked but many terms came out in odd non-standard ways)).

Norwegian, because this was actually the first Simula system, which was piggy backed on the Case-Western Reserve Algol done on the predecessor machine in 1965.

Here skipping the preliminary forays, we eventually took the machine code listing on fan-fold paper and spread it along the 150+ foot long hallway of Merrill Engineering Building. Three of us grad students crawled* over this listing with colored pencils marking things and yelling out discoveries and questions. After a few days of this we had found most of the main sections, and especially that the storage allocator was not like that of Algol, but was allocating and recovering instances of Algol blocks in ways that allowed them to persist over time. We eventually realized that what the first Simula called an “Activity” was what Sketchpad called a “Master” and what the next version of Simula (and we) call a “Class” And that what Simula called a “process” was what Sketchpad (and we) call an “Instance”.

That allowed us to read the spotty documentation and the “code crawls” also found the hooks to an OS (such as it was), and after a few tries (no debuggers on this machine) we got the system running, and I was able to get the Simula “hospital simulation” (a kind of job shop model) running.

The big deal for me was to see that Simula was a “poor man’s Sketchpad” on the one hand (it didn’t solve constraints automatically (or at all), but on the other because of the way you could program it, it would allow many things to be explored that Sketchpad wasn’t able to handle.

Most important that the two “almost-object” models in Sketchpad and Simula — thought about together — were the same abstraction as the virtual machine processes in a time-sharing system and the individual computers that were going to be on the ARPAnet (then in development). This led to a big light going on, and what I call “Object-Oriented Programming” (a very bad term, I should have called it “Agent Based Programming).

—-

*an interesting UI for code perusal!

** It’s worth pondering that Lisp is a kind of programmable meta-language, where the trade-offs are very different than most people are used to. It’s kind of a universal material, so it rewards deep design thinking, but many people flounder because it requires more imagination than most languages. I.e. trying to reverse engineer Lisp is very illuminating if you look at how a Lisp is made, but just trying things will likely leave most programmers quite in the dar

Are the greatest works of a field performed early on in the field’s lifetime?

Well, let’s take Physics or Biology, etc., as examples, and let me ask you what do you think?

This is a hallmark of a “real field”. It’s not clear that Computing has gotten there yet — one clue is that “real fields” build on the great work in their past, and “pop culture pursuits” don’t (and generally do “a variety of things” but without much growth). I think of Computing as — not completely — having devolved to the latter, but others might disagree.

How can I see my design work with a critical eye?

Mark Twain advised writers looking at their own writing: “If you see a phrase you particularly love, strike it out without hesitation!”.

It helps to “create with love” and “debug with disdain”. If you are critical too early in the process you will kill ideas before they have any chance to surface. I write my ideas down and put them away so I can have them but not get entranced by them.

Scientists use other scientists to help them debug (in many ways, “Science” is the whole community that helps deal with fondly held ideas and noise).

I was greatly aided in my 20s by gradually transitioning from deriving identity and self-worth etc from worrying about the goodness of my results to worrying about the quality of the effort I was putting in, and the quality of sustaining the effort on things that seemed to be important visions, issues, and problems.

This takes a lot of the glitter out of the creation, and one can still be successful in the quality of effort process by being extremely critical about the designs.

I would venture that most colleagues I’ve worked with over the years might say that I’m much too critical. I hope they would also think that I sustained efforts pretty well.

What are the best resources for learning systems and networks according to Alan Kay?

Let’s see if we pool efforts to come up with a good answer to this question — especially with more current sources than some that influenced me.

I liked the discussion of systems and design in Christopher Alexander’s “Notes on a Synthesis of Form” from 1964. Books by both Bertalanffy and Waddington were excellent.

My favorite molecular biology book has been “The Molecular Biology of the Cell” by Alberts et al. “The Machinery of Life” by Goodsell has the most wonderful illustrations to help visualize what is going on. A good one on development (morphogenesis) is “Endless Forms Most Beautiful” by Carroll. A more recent one — perhaps to be read before Carroll’s book is “Life Unfolding” by Davies. I found another recent book “Cell Biology by the Numbers” to be really helpful.

I’m very partial to Minsky’s “Society of Mind” for an approach to a systems design for “psychology”.

For an example of large scale systems thinking in computing, try David Reed’s 1978 MIT Thesis on “NetOS”, an operating system design for Internet sized systems. I got a lot of insights from various papers by Butler Lampson (see his website).

There are many more, but let’s see what others come up with here.

What has happened to Alan Kay's View Points Research Institute website?

Looks like a combination of an overzealous bot plus human error. Should be back in a day or so.

What are the main successes of the AI group at Xerox PARC during the 70s and first half of the 80s? How much influence did it have on other groups (like Smalltalk group) and vice versa?

Right now I can’t produce the level of answer this question deserves. I’ll have to do some homework and then come back to add and fix.

There were many stellar “Strong AI” (have to use this because “AI” has now been colonized by smaller goals and scopes) and related researchers at Parc — including (I’ll miss many important ones in this pass) Danny Bobrow, Marc Stefik, Terry Winograd, John Seeley Brown, Richard Burton, Brian Smith, Ira Goldstein, Kurt van Lehn, etc. I will include Warren Teitelman here because his MIT thesis was a deep AI thesis and he wound up becoming the main force behind Interlisp, with special contributions in various kinds of system reflection.

There were also many researchers who made extended visits.

I will punt on trying to list the major systems produced at this point, but: KRL, PIE, Steamer, etc.

I can answer the last part of the question. Besides the intrinsic interest of “what lies behind intelligence?” and “what can and should be done with real intelligence?”, there are also deep and interesting questions about tools, for example languages to make languages, especially representation and inference languages, ontologies, etc. questions of what is actually needed to make “experts” and what is actually needed to make “experts” less “brittle” in the larger world of knowledge and thought?

My view was that many of the AI researchers were working on the very problems that were needed to understand and create the next generation of programming languages and development systems for pretty much everything. For example, I was quite taken in the late 60s by Carl Hewitt’s “PLANNER” system (much more than just something that also happened before Prolog). I thought it could be combined with object ideas I’d been having to make a whole new way to think about programming and representing.

This was my initial plan for Smalltalk. Fate intervened with something else. But these ideas reflected back to Carl Hewitt in his later “Actor” ideas, which are more like the early Smalltalk ideas, and which went beyond in a number of ways.

To just give one more example, in the PIE system, Ira Goldstein and Danny Bobrow did a quite wonderful major permutation of Smalltalk to institute both “roles” (they called them “perspectives”) and “world-lines” that used some of McCarthy’s “fluents” ideas to allow a much stronger control and modeling of time. Both of these were major advances, and should have led to much more (they wrote 4 papers that were published as a Parc “Blue Book”).

I’ll leave this here for now …

What advice would Alan Kay give a curious individual to improve their ability to think and learn? Is there a place to see his library — every book, person, and research he has studied?

The people I’ve admired — many from afar (such as Bertrand Russell), some as friends and colleagues (such as Jerome Bruner) — share the traits of trying to (a) find ever more perspectives on things they are aware of, (b) find more things they weren’t aware of, (c) interrelate what they’ve learned, and (d) make mental spaces for their own ideas where only the “perfume” of what they know lingers[*].

A useful ploy is to consider oneself quite blind, and to have the goal to find ways to see, however murkily. If the murkiness can be assessed to one degree or another, then some footholds can be established to help parts of the process get “above threshold”.

It might be an oversimplification, but I think all of this has a lot in common with the kinds of processes needed to learn and play classical music, modern science and math, etc., meaning that both the “art” part and the “technique” part have to be developed beyond “certain thresholds” in order to be effective and more than just gestures. This can be excruciating sometimes — for example, I love music but require quite a bit more practicing than most musicians for comparable results. This was just the luck of the genetic draw. But the love helps the work and the art, and I think this is an important general key to making progress.

A good slogan is “Technique should be the Servant of the Art, not the Master”. This is tricky to manage because technique is really needed; it is easier to advance; it can impress others to the point of misleading both them and the artist; and it often makes simplicity difficult to grasp and achieve.

Despite being a lifelong autodidact and voracious reader who had a difficult time being at all patient in school, I found that dealing with official school processes — most especially in college and grad school — to be really important. I needed that set of external ideas, standards, assessments, even when they weren’t very good — in the end they helped. Most autodidacts I know who try to do this on their own rarely learn enough from enough perspectives to be “vital thinkers”. I also found that plugging away on stuff that was very difficult to learn really helped — even just the “plugging away” part, and just the idea that one’s own nose is good, but not sufficient.

Frank Oppenheimer — the creator of the San Francisco Exploratorium, a physicist, and the brother of Robert — once told me that when they first set it up he got many complaints from some of the sponsors and donors. They wanted and expected “a science museum” and they complained that what he’d made was a chaos of 2000 children running around and breaking things!

He said to them “You don’t understand. The door to Science is to realize that ‘the world is not as it seems’. We’ve made 500 different hands-on exhibits that show this, and we figure that 2000 kids bumping around 500 exhibits will produce matchups so that a particular kid and a particular exhibit will trigger off the deep hit that ‘the world is not as it seems’. Many children will never be the same after this, and they are ready to step through the door into Science.”

This parable works in the large. A particular “great book” may not match up to a particular curious learner. I advocate reading lots of books, and to get fluent enough in reading to also read books that don’t match up, or sometimes are even mediocre. Some of this overhead will pay off later as one changes and grows. A good library with a few thousand books is a kind of “exploratorium” for readers.

A university is another kind of exploratorium if one has learned that universities have a dual role (a) to provide many perspectives and examples of knowledge that one doesn’t have, and (b) in parallel to be used by the learner to pursue interests. These quite different routes both need to be taken. The former is too random by itself, the latter is too inward and ignorant by itself.

Einstein said that “Love is a better teacher than Duty”. If the love comes from curiosity in the large then enough duty will follow to keep the processes from becoming too narcissistic and narrow.

________

[*] Footnote: the (d) part above was not clear to one commenter. We have limited mental resources — called “cognitive load” — and this means we are easily saturated before we have enough resources on the one hand, or enough capacity left to think our own thoughts on the other. The thoughts will generally be trivial if we don’t know much, but will be only the thoughts of others if we don’t make space for ourselves.

In particular, we are quite overwhelmed by the clamor and detail of what is thought of as “normal” and “reality” in the present. We have to find a quieter place in which to think (and where “reality” can be safely considered as a human construct, and often a social one).

I’ve advocated “learn everything and then forget it except for the perfume”. This can create a mental space for thinking which will inescapably be helped by what we know — it’s really hard to completely forget! — but in which what we know (mostly meaning what we believe!) is far enough away to allow us to feel things, listen to our subconscious whispers, and generally barge around. “Most ideas are mediocre down to bad” so we need to do something with ideas also — one strategy that works is instead of diving in, to note them down so they can also be put on the back burner for possible later use.

When something more promising wafts up, it will include something that is like an “odor”. This odor will very likely be like some of the perfume hanging around, and the perfume can be followed to what you learned. If you’ve been able to suppress the present a bit, the odor will likely take you to parts of the past that didn’t seem relevant (perhaps because they didn’t contribute to the current “reality”).

I think it’s obvious to every reader that if you only learn things that contribute to your near-term specific goals, you will very likely not wind up with any interesting versions of the “other pasts” that can help with new ideas. This is one of the most powerful reasons for general learning rather than just “problem solving learning”. The latter very often misses “problem finding” of what should actually be worked on. The result is that many are not just trapped in a weak present, but lack the perspective to even see that this is a large part of their problem.

Our human context mechanisms — many of which correlate with language — want “crisp topics” — but the oldest senses, especially smell, provide superposition properties — spatial, temporal and communicative — that are extremely powerful for animals that are highly developed for this, and when thought of as a metaphor, for we humans when we are trying to deal with lots of things without getting overwhelmed.

How does Alan Kay identify talent in young researchers?

The wonderful Dave Evans did not look at transcripts, but was interested in resumes and “sparks”. He over-admitted, treated his students like gods, and then (I found out later after I was on the faculty), at around the two year point, the faculty would meet to decide if the student was a “real person” (meaning were they the real deal as a researcher). If the consensus was “no”, then the student would go out one door with a Masters degree; if “yes” they would get a PhD.

ARPA was willing to pay for this, and (I think) many of the ARPA projects had somewhat similar processes. Dave was still a fairly extreme edge of this.

Most of my career has not been directly in universities, but a variant of this idea was used at Parc, and especially for interns and visitors. This is a really nice way to get the kind of experience needed with a person before committing to the long term close relationships that research teams need.

MIT had a terrific intern program with technical companies — called “6A” — where an intern would spend two summers and then a whole year, and wind up with a Masters at the end. The first summer would be “to see”. The second summer, the intern and the groups would have an idea of “match-up”, and the year would be a concerted project.

I think most of my impulses for “interns and visitors” were “sparks”, and a lot of these resulted from talks, and sometimes classes, I’d give.

So the short answer is: give them chances!

Why did the federal government invest $117 million into the Radiation Lab, Servomechanism Lab, and the High Voltage Lab at MIT in the 1930s?

I’m in London right now so don’t have access to my copy of Nilo Lindgren’s book about 100 years of EE at MIT.

But I do recall that both the Rad Lab in Building 20 (eventually invented about half of all the radar systems used in WWII), and the Servo Lab (trainers in WWII, afterwards: Whirlwind computer, core memory invention, early interactive graphics, real-time simulations, etc) were not started until about 1940.

All three of the labs had connections with Vannevar Bush, especially during WWII when he was Roosevelt’s Science Advisor, but I recall that he was already active in DC in the 30s as well as continuing his deep connections with MIT.

I think that the High Voltage Lab was started when they got Van de Graaf from the UK in the 30s to start making van de graaf HV generators — and I believe that Bush was also behind this.

(Copying in my answer to Chris Lee’s comment)

There are good books about:

the Rad Lab (more or less initially funded by Alfred Loomis around 1940 anticipating that the US would eventually be in the war, and radar would be needed, and that the superior British technologies should be taken as the starting point)

the Whirlwind part of the Servo Lab (which has a little about the WWII role in the first part of the book).

There is also a pretty good biography of Bush available, which I think is indispensable to much of the technical history of MIT from the 20s on.

Where did Alan Kay learn about how context affects the ability to think?

I was lucky enough to learn to read fluently several years before attending school, and by the time I did, had read enough to have encountered different points of view on some of the ideas that were taught as hard truths in first grade. When I piped up and said “But I read this in a book”, I was not encouraged by the teacher, and I soon got the idea that school was not about “books” but about “the one book” — their book — for each subject. But, via reading, that was already too late for me — I’d gotten into the larger world of ideas.

The first adult book I read all the way through was my father’s copy of Edith Hamilton’s “Mythology”, and the most important part of this book was the tacked on Norse myths after those of Greece. I realized these were essentially the same stories with different names for the characters, and that was illuminating — and also helped greatly in resisting attempts to teach me the local religion.

Another book that my father had was Breasted’s “Ancient Times” about a wide variety of early civilizations, each with its own ways and beliefs. Reading many books, including lots of golden age science fiction in the 40s and 50s helped.

Getting deeply into science as a young child helped. I loved rockets and jet planes around age 5 or 6 (ca 1945–6) and got a present around age 7 of Willi Ley’s “Rockets, Missiles, and Space Travel”. This had a lot of interesting stuff in it, but the most striking thing I can recall from this book was the discussion of Hohmann orbits: basically that a spaceship is also in orbit and can’t just “drive” from one planet to another, but must follow an orbital path of its own to catch up with an also moving planet where it will be, not where it was when the voyage started.

However, I don’t think I can pin down a good answer to Eddie’s exact question. I got interested in why people in crowds do some of the things they do — many of them horrible — this got me into Sociology, and then into Anthropology, which is about the contextualized thinking of human cultures (I wound up minoring in Anthro in college).

Certainly, I can remember the big hit when I finally understood what McLuhan was driving at. I didn’t understand “Understanding Media” when it first came out, but was motivated a few years later to spend an entire summer trying to understand McLuhan. A good route for me was through his earlier book “Gutenberg Galaxy”. This is a very large idea about “contexts” and that the fluent learning of a context is not just a simple layer of symbols, but what has to happen to our brain/mind to be fluent in most things.

A book that helped around the same time was Koestler’s “The Act of Creation”, which looks at contexts from a different and very useful perspective.

Another way to look at insights like these is as a transition from things you already know to a different world, where you realize that you understood them, but you didn’t understand them. That can lead to active searches for stronger more illuminating worlds.

What are examples of "Perfect" and "Better" in regards to Alan Kay's "Sweet Spot"?

Mark Miller’s answer nicely captures much of what I was trying to communicate in the aphorism “Better and Perfect are the enemies of ‘What is Actually Needed’”.

The one thing I might add is that one of several useful ways to think about the “MacCready” sweet spot — what is actually needed — is as the lowest thing that is qualitatively beyond “better”.

As Paul said when he started thinking about man-powered flight, which the best people in the world had failed at for more than 50 years despite several large prizes: “The problem is that we don’t understand the problem”.

This got him to drastically change goals (I tell this story in a number of talks which have found their way to YouTube).

He essentially told his crew to forget about man-powered flight but to create an air-frame that could withstand “10 crashes a day” — by being fixable with tape and wire.

With this, in a few weeks they were able to make more test flights (and crashes) than everyone else in history over many decades, and got a much stronger sense of the problem. Once understood, this turned out to be relatively easy to solve, and in about 6 months from the time Paul started thinking about doing man-powerful flight, they won the first of the prizes, and shortly thereafter, the big prize for a man-powered flight across the English Channel.

When asked by British journalists why this was possible compared to everyone else, he said “They were trying to make an airplane, we were trying to do man-powered flight”.

Another way to understand this idea is to notice that the learning process frequently really requires going away from what was thought to be the goal. This is hard for us primates! But if it is really new, it can’t be characterized in the existing context, and a new context has to be created to hold the new idea.

I’ve told this story many times to try to help people understand why Parc and the ARPA-IPTO before it were so effective compared to most other efforts at hardware and software design. The key was that ARPA was very happy to fund “problem finding” — not just the “problem solving” that wrongly-named “rational organizations” wanted. By the time Parc started, ARPA had done 8 years of “finding the good problems” and also creating the researchers who would go to Parc and complete the finding process with deep solutions.

Almost all the projects we chose at Parc were at that special sweet spot where they were just barely doable, but also above the qualitative threshold, so that just barely doing them opened whole new worlds.

I should say something about “Perfect”. Picasso said “A work of art is never finished, just abandoned”, and this captures some of the snares of “Perfect”. Human beings tend to be a lot more pragmatic than idealistic, so “Better” is a bigger problem for most. But there are those — I will include myself here — who hear the siren’s song of perfection.

For these, Picasso’s “abandoning” is good to learn. One way to do it on large projects is to associate with colleagues whose strongest urges are to “complete”, and to just let them do this as they see fit, as one gets entranced by beautiful new unreachables.

What does it mean for a language to have Smalltalk-inspired syntax?

There are three main answers to this question because there were three somewhat different Smalltalks (‘72, ‘74, then ‘76, ‘78, then ‘80).

In the first version, each object was passed a reference to the message, and did the work itself to parse and decipher it (this idea was partially inspired by Ned Irons’ (non-object) extensible language IMP). This meant that the programmer was specifying the syntax of the language when defining a class — there was almost no base syntax. The result was a highly extensible language in each of syntax, semantics, and pragmatics. This was slow, but general, and did not guard against every problem (very Lisp-like in many ways, rather like an FEXPR for everything with more help in parsing). There were many possible ways to refine and improve this idea, but we got a lot out of it just from the very first simple implementation, which in part, was a response to a bet about how small a definition could define everything (see “The Early History of Smalltalk” for more details about how all this progressed.)

When the Alto didn’t get superseded as planned, there was a move to optimize Smalltalk, and also clean up its self-definition. Dan Ingalls did a lot of the heavy lifting on this, and Smalltalk-76 — and later the tinier Smalltalk-78 for the portable Notetaker computer — were the results. These went to a fixed syntax scheme so code could be compiled ahead of time into byte codes (like Euler, and my thesis language Flex), and Ingalls chose a simple syntax scheme to reflect the most used extensions and conventions in Smalltalk-72 — much of which could be called “keyword” or prepositional. IMO, his choices were a great compromise, and resulted in a highly readable syntax (certainly my favorite of the later Smalltalks). You can see this language described in full in Dan’s really great POPL ’78 paper (which should be online).

When Smalltalk ’80 got worked on, the designers forgot a few things about human beings, and went for slightly simpler more uniform syntax, and left out a few ideas that could have helped readability (for example, “implicit self” permits prefixing when that helps readability, especially for control structures like If, While, etc.). Since Smalltalk-80 was the release version from Xerox, this simple keyword oriented syntax became the standard (I think to the detriment of adoption of the language). There will undoubtedly be explanations of this syntax in other answers to this question.

There have been a few further ideas along these lines since. For example, CodeWorks by Marc Lentzner was a very nice syntactic design for an OOP language, and noted that if you use keywords/prepositions, then you should be able to order the parameters as desired, and also be able to have defaults for omitted parameters. I made a suggestion while still at Parc that a good idea might be to have a simple grammar for making up messages (this would be a nice intermediate between a fixed syntax and the open syntax of Smalltalk ‘72). This was not taken up, but I think that something like this was done for the Grace language.

If you had to choose just one word for your gravestone, what would it be?

Curious

What will happen if one rewrites the entire Linux using Smalltalk?

I’m going to translate this question into “What are good ways to rethink operating systems in our day and age?”

We generally assume that we have computer(s) that not only have CPU(s), memories of various kinds, and connections to networks, but that incorporate some forms of hardware protection for their resources. The nature of the latter will affect the kinds and amounts of work needed to be done “at the bottom”.

On an architecture like the Burroughs B5000 series, almost everything important for an OS is already in the hardware of the machine, including the first great protection scheme that would later be called “capabilities”. At Xerox Parc we made microcoded computers, where the microcode cycles were about 5 times faster than main memory could cycle, and which used 16 program counters to allow the microcycles to be interleaved with zero overhead task switching. This allowed us to quite easily replace what is usually thought of as an OS with the “real object” oriented language Smalltalk. Using an architecture like the B5000 in microcode, and with the ability to reflect what we learned back into the microcode, turned out to be very powerful and very simple to do.

On any kind of machine, we want to allow programmers to safely and efficiently use any levels of the system they need to accomplish their goals. So, on most machines, the interplay between the deep kernel and the hardware will generally be tricky — most computers are not as nice as the B5000 or the Parc computers. For example, any decent CPU these days should have completely protected processes and not have to resort to unprotected “threads” … but this is not generally the case, even though very little silicon is required to do this.

One of the best ideas in the original Unix was to try to make the “protected mode” kernel as small as possible, and get everything else needed via standard user-mode processes. The 1000 lines of C “Linus kernel” quite rightly went back to that idea, and was a great attempt to deal with protection and allocation of all resources with no additional BS.

(It’s worth noting that Unix was done in the early 70s — on PDP-11s — as Parc was starting up with very different ideas about both hardware and software — although Doug McElroy’s way to think about Unix was along some of the same lines.)

We now need to look at the nature of systems and their design to motivate what we’d like to wind up with. There are several good books from the dawn of “systems thinking” that have excellent insights (including those by Bertalanffy, and by Waddington).

One of my favorites is Christopher Alexander’s PhD thesis “Notes on a Synthesis of Form” (which has been continuously in print since 1964). The thesis is about design, and how constraints and dependencies make any real design problem difficult to sort out. Alexander uses Plato’s quote “You should separate Nature at its joints as a good carver would” to point out that how you try to understand a system (and to make them) depends a lot on how many interconnections and dependencies you have to deal with, and these often can be reduced by judicious clustering into modules. (Part of his thesis was a Fortran program he wrote to try to find clusterings that had minimal outs-and-ins.) This is tough to do for architecture, and he moved on to “the timeless way of designing” after moving from Harvard to Berkeley in the 60s.

But there is a lot of gold in this book for computer people to ponder (as indeed we did in the latter part of the 60s and then at Parc). If a system is thought of as “clusters” inter-connected with communications, then it is natural to think of the interior of a cluster as also a system. This gives a natural decomposition, and it also suggests that the clusters need whole computers to model them, and things like networks to allow them to communicate (we can see another source of ideas for “real objects” back then). It is a simple jump to use software “real-objects-that-are-whole-computers” as a universal building block. Some of Alexander’s ideas about constraints, dependencies, interactions, etc. can be used to try to find clusters that “minimize difficulties”.

Most of these ideas were around in one form or another, and the fun of Smalltalk was to see how far they could be taken in the early 70s on a very small but very nice computer.

An important principle here is that there are techniques for not only making programming languages, but for making them fairly rapidly. If computer people don’t know them, they will be tempted to use something that is around rather than to try to think things through again in the current context. (This is important because as Butler Lampson like to point out, Moore’s Law means that a lot of your basic design rules actually change every few years, so you have to be very careful about what you think you know (it could have gotten obsolete right under your nose).)

So I would start from today-into-a-10-year-future and think things through again from scratch.

Certainly, it would be crazy not to think of “the computer” as the whole Internet. A start on this would be to look at Dave Reed’s 1978 MIT PhD thesis on a design for a “whole Internet” OS. (And you can look at “Croquet” and its open source version “Open Cobalt” to see implementations of some of these ideas.)

To keep this from getting even longer, I’ll just pick a few things out of more.

I’d most definitely try to materially advance the state of the art in “dependencies management” along every dimension. Even with great clusterings to make things more understandable, there are still many dependency problems. I’d make a CYC-like “expert system about systems” to help here. Scaling means that names get less and less useful, and active negotiations between modules using descriptions and safe testing get more and more important (the expert system would help here also). I would most definitely incorporate various kinds of “pseudotime” (McCarthy, Reed, Jefferson, Lamport, etc) to eliminate most race conditions on objects’ knowledge (especially distributed replicated objects’ states and processes) and allow “world-lines” for “capturing time”.

Back to the beginning here. The tendency has been for some “operating systems” to start off clean and minimal — like a good belief system — but then the acolytes start to worship it and start adding junk doctrine, especially ways to avoid doing needed work. (Some operating systems started off like a pop culture and never were clean and minimal!)

It should be easy to see that in this day and age that computations need to be in terms of systems (as described above) and that they need to be automatically mapped into various physical computers in a variety of ways. This means that the job of the resident software on a piece of hardware is to do almost nothing except help with protection and allocation of time and space and other resources. It shouldn’t have to know about the computations it is running (that’s part of what protection is all about), and even more important, the computations had better not have to know in the 1st order about where they are running. As far as their resources are concerned, they should be in terms of more modules, which can be delivered to a physical hardware if necessary. Part of the space allocations by the tiny kernel could help with caching, etc.

The idea behind the Internet (the most successful “real object” system in the world of computing) is that the most important agreements should not be in terms of what’s in the nodes but in terms of how the intercommunication between the nodes should be done. It’s worth trying to base future thinking on enlargements of that idea. From this standpoint, we can see that the “interesting language” is the intermodule one.

We could have done this better when doing Smalltalk, but it was still an interesting first pass. Once done, Smalltalk shows how to use and intermodule language as a way of doing things recursively “all the way down” for making systems. But it doesn’t at all preclude having any other kind of language or mechanisms inside an object. That is a large part of the reason for having encapsulated “clusters” be thought of as “objects”.

What does WYSIWYG mean?

What I am doing right now is not WYSIWYG.

For no good reason, and several bad ones, the Quora text input system (and most others in web browsers) make you type into a separate area that is decoupled from where it will wind up and what it will look like. Quora does do a bit of decoration (vs e.g. Hacker News) but it is still not what I want to use (and it is not what we first made work because at Xerox Parc we wanted to work in “terms of our results”). This also included what is now called “live programming” (at Parc it was just “programming”).

Contrary to some of the other answers, it was not primarily about printing, but about working in real-time in terms of your results. One of the metaphorical questions we asked was “Imagine if you played a flute and had to wait seconds or minutes to hear and make better”. This is part of another Parc driving principle: “No Modes”. What’s interesting is that one of the modes we got rid of (“insert mode” vs “replace mode”) for text has been retained. But non-WYSIWYG interactions impose an annoying and not needed mode between authoring and experiencing.

A little aside to give our definition of “modeless”. It recognized that something like a mode — a window showing something specific — can be very useful, but what is needed is to be able to get to the next thing you want to do without having to explicitly get out of what you are doing. For example, a window with something in it is a quasi-mode, but because you can start any other action directly (e.g. clicking on anything) without having to say “get out of this window”, we would call it “modeless”. In text editing you don’t want to have to get out of “inserting” in order to “replace”. In programming, you don’t want to go through an edit in an editor, submit to a compiler, which submits to a loader, which requires your system to intiialize, and so forth. You just want to be able to deal directly and safely with what you are trying to achieve. Etc. Since people make mistakes and also want to explore options, you also need a comprehensive UNDO.

Another aside: note just how really bad are the UIs on so-called “smart phones” (they are brain dead).

So “What You See Is What You Get” is a very big and very important part of personal computing, which includes the idea of “symmetric ‘reading’ and ‘writing’ “, For no good reasons, all this was thrown away when the web and the web browsers were done. Ironically, the web was done on machines that already had good WYSIWYG applications that had been around for a decade or two.

The large idea was an ARPA-IPTO idea. At Parc, when we decided to sacrifice a lot of our precious RAM to do a bit-mapped display, we realized that by going to this omni-display we could get really good fidelity, and results-mode interactions if we were willing to do the work to make it happen. (I wish more computer people could take the gift of personal computing more seriously and not find reasons to cripple the ideas.)

In my recollection, the term was made up at Parc by Charles Simonyi when he was Butler Lampson’s grad student and did what became Microsoft Word (called Bravo there) as the example for his PhD thesis on “intentional programming” ca 1974–75.

Could a modern-day Xerox PARC be funded through Patreon?

There’s coming up with the funding, coming up with the visions, and coming up with the people.

Butler Lampson estimated that the number of computer researchers at Parc during its most productive early years was “about two dozen”. I.e. “tiny”. I think there were about 40 or so in its second five years (it’s total span was about 11–12 years depending on how you count).

The biggest difference monetarily between then and now is that the house prices in Palo Alto increased about 10 times faster than inflation. In 1971, the modest salaries at Parc allowed one to buy a house, whereas the comparable salaries today would not. On the other hand the amount spent per year per researcher for computer resources (most of which we had to make ourselves) was enormous.

I mention this because it is more complicated to translate the estimated (small) budget for Parc back then to today. A simple translation for the computer part would be roughly $10M/year today, but more compensation would have to be added, let’s say to about around $12M-$15M/year.

Parc made a lot of progress by being willing to build most of its HW and SW from scratch. One of the large parts of that process was to make various kinds of hardware accelerators for software, including programming languages, to try to limit the need for optimization in the software. That would still be necessary today, but the HW side especially would be quite different. Some of how we used fast discrete components to fashion microcoded processors could be done in FPGAs today. But I think that the ability to quickly do state of the art custom chips would be needed.

Whatever the final budget might come to, it would still not be large. I think it could be too large pragmatically for Patreon to muster, but not impossible. The need would be for at least a 5 year commitment on the part of the funders.

Parc had a lofty and large vision about universal personal computing and internetworking for the improvement of civilization that it brought from its roots in the ARPA-IPTO computing research community of the 60s. Graduate students within ARPA naturally thought in these terms and scales, including the idea that most things would have be built from scratch.

I think both a large enough vision and the special people would be the hardest to come up with these days.

It is certainly hard to locate visions that are anything like the ARPA/Parc ones in scope and romance. There are a few scattered wonderful projects* but not a community.

It’s worth contemplating this dismal state in today’s world of tiny increments to what people might want to buy rather than what they actually need. And especially the oddity that 40–50 years ago when it was really difficult to make computers and do things with them, people did quite wonderful things — whereas today the computing resources are abundant, but most projects are pretty trivial. Most that still use old grand terms — e.g. like “AI” — have discarded what’s important and difficult about what the terms used to mean — in order to concentrate on the easier to do subsets.

e,g, take a look at what Bret Victor is now doing …

As for the people, it’s certainly the case that there are numerically many times more computing people around with the “IQ”s (if I can use that term for a certain kind of mental power rating) of the Parc researchers.

On the dimension of “creativity” there are enough super-creative people to draw from.

But perhaps the most unusual aspect of the ARPA/Parc culture was that it was strikingly more cooperative and more able to get synergy from colleagues than my perceptions of today. This is too large a subject for this note. Getting that synergy to happen was a deliberate part of the ARPA/Parc process, and to pick just one name — Bob Taylor — there were real efforts to make it happen. (I need to mention that this was far from perfect — but it was quite a bit better than processes I’m aware of today.)

Trying to sum up: when I think about this stuff in relation to questions about Parc, I keep coming back to the enormous and I think necessary influence of the ARPA community on Parc, from the standpoints of perspective, desires, vision, methods, people, and colleagues to continue working with after the fact.

In several lectures I gave recently at Stanford, I tried to sum up by saying “the goodness of the results correlated most strongly with the goodness of the funders”.

Why is it possible to use an object as a parameter directly?

In Smalltalk, everything is an object and it does pass parameters, so …

As to “why?”, it’s worth thinking about what a data structure “knows”, what the receiver of a data structure needs to know, what an object knows, and what the receiver of an object needs to know.

If you can see that the answers are: not a lot, a lot, and a lot and not a lot, then you have answered your question.

You can start to see why using objects to simulate data structures is not a great idea at best, and gets worse and worse as your system gets richer and larger.

What resources does Alan Kay recommend for learning real object oriented programming?

Following on to Mark Miller’s answer

One resource for morphogenesis would be “Endless Forms Most Beautiful” by Sean Carroll.

We get children started on systems by having them make systems with lots of parts — like epidemics, the foxes and the rabbits, “ants”, ecologies, etc. It is pretty easy to model some systems things like “Ethernet” in a child’s language such as Etoys.

The basic idea here is that “simple programs are too simple to learn about programming, design, systems, etc.” On the other hand, complex programs are too hard to do when you are a beginner.

A sweet spot is massively large systems with relatively simple interactions. A number of good ones are found in Mitchel Resnick’s book “Turtles, Termites, and Traffic Jams” and introduces StarLogo.

Many of these are SIMD, so they don’t have enough interactions to get beyond “data”.

What’s needed are examples where a simple “data-centric” approach is too fragile ( need to get beyond SIMD).

A good “real oop” language will allow:

children’s examples of this kind to be done very straightforwardly

simpler examples to easily be done staying with “real oop” rather then degenerating into simple data and procedure programming

rather complex systems — like operating systems, simulations of complex engineering, scientific, and biological systems to be done without adding unnecessary complexity

What does Alan Kay think of the Swift programming language?

I don’t think my opinion matters. But it’s worth looking at the stated goals for the particular design and implementation of any programming language and (a) see how well they match up to the system in use, and (b) what important goals have been left out.

For example, for C, (a) it wasn’t too bad for its original goals, and (b) almost nothing that has been written in it — especially by humans — in the last 25–30 years or so should have been. This is not a slam on C but on the attempts by humans to use a tool of particular range and use in areas — like large scale software systems — where it is not at all suited.

Most languages do more or less well at (a) but not so well at (b) and usually get increasingly worse at (b) over time.

So — for Swift — I think we could almost ignore (a) and look to see how it does for (b).

What do you think?

If Tony Hoare said that “debugging is harder than programming and you shouldn’t use all of your cleverness to write the program” — or make anything — then what role does planning have in engineering and software design?

The famous Tacoma Narrows bridge that vibrated itself to bits in a strong wind was planned, but the means to do a good simulation did not exist when it was built. Today, we would expect that a bridge we built would anticipate most stresses. The old joke was that “But test pilots still wear parachutes!”. With today’s planes, a parachute might not help at all, so more care needs to be taken even when testing (and better simulators help every one and every thing).

It’s hard to tinker a great sculpture from malleable clay just by debugging. Planning is a must in most art. But the process is also a finding, so we have to preserve the paradoxes that “plans have to be debugged” and “engineering has to still learn in the process of building”.

In my experience, we had a quite a bit of luck in first debugging our ideas before worrying about the amount of optimization necessary for practical use. Since some of our ideas involved real-time interactions by humans, we needed to do our experimental work on individual super-computers that were fast enough not to need optimizations, and we generally had to design and build these as part of the process.

One way to think about this part is that it is like the “SIM” part of CAD->SIM->FAB in modern engineering. We can SIM the ideas and much else of software if we have extra capacity. For example, the “Worlds” system of Alex Warth et al, allows fine grained UNDO and “possible worlds reasoning” at most levels of execution. This allows a “cradle” to be built around the project that will allow every event to be captured and studied if necessary.

Finally (for here), it’s worth thinking about what good tools for reformulation of programs might look like. I think I’d start with where Smalltalk left off, find the next set of improvements in dealing with always running code, and think a lot harder about how to deal with dependencies in graceful ways, and especially how desired dependencies can be preserved while reformulating.

Can Smalltalk code be written on "bare metal" hardware?

I’m not sure I understand this question, but -sure-!

When a bunch of us were at HP in the mid-2000s we did a bakeoff for a children’s machine (a few years before OLPC) where there was not really enough room for the speed, graphics ability, and low power consumption. We invited many vendors to bring their boards with the CPUs installed. Overnight we would bootstrap Smalltalk (its Squeak version which was particularly aimed at “bit-identical” bootstrapping) and besides running our own benchmarks, we could see what it would be like to interact with comprehensive GUIs and graphically heavy apps.

(The CPU we chose, was also the one — for similar reasons — that was chosen a few years later for the OLPC XO.)

The route to that kind of bootstrapping really depends on what’s already there (i.e. how “bare” the hardware is). It is fun to think about how to do it from absolutely bare metal, but this is never the case these days. There is always a C compiler at least. In fact Squeak was written in itself, and the critical parts of this were a subset that could be translated into C. So bootstrapping was really dealing with various resources provided by the vendor board.

An historical note. There was a guy by the name of Ted Steele in the 60s whose hobby horse was UNCOL (universal computer oriented language) to solve the n*m problem. It was kind of an abstract assembly code. Instead, C for good and ill wound up being the UNCOL of the last 20+ years.

Ian Piumarta has done a number of “from nothing” bootstrap systems.

Dan Ingalls and I had a very pleasant conversation a few weeks ago on the way back from visiting Bletchley Park about whether you could — with today’s large RAM — do a higher level bootstrap “fast enough” so that the garbage collector could be written in the higher level language before you ran out of memory ...

Again, the existence of C makes this more of an artistic gesture rather than one of desperation.

Another conversation we had — which included the work of Alex Warth — was a better minimal bootstrap of Val Schorre’s Meta II system. Alex did one that was nice (it turned into his UCLA thesis), but I think there is still gold to be mined in that POV.

I returned to this (Oct 27th) because there are a few more cases to ponder.

Let us suppose that the machine doesn’t have any software yet, what then? When I was working my way through the U of Colo as a programmer for the National Center for Atmospheric Research (NCAR) in the early 60s, the powers that be decided to get a Control Data 6600, one of the first real supercomputers. The head programmers at NCAR were certain that CDC would not be on time with their SW (there had been much slippage with the current 3600 machine at NCAR). So they decided to do much of the systems software themselves just in case. My job was to take the manuals of the 6600 and write an efficient simulator of the machine for the 3600. I did, and they did, and so we had a fair amount of debugged system software ready to be loaded on the 6600 when the bare machine started working. (The rest of this story is fun, but off topic, but I should mention that one of the benefits of a simulator is that you can put in a lot more facilities for dynamic debugging than the real HW easily allows.)

BCPL, the predecessor of C, was in the form of a byte-coded machine that one would first get going on the new HW, and it had the facilities to then write more efficient machine code generators and recompile itself, so it could work as a systems language on the new HW. One of the ways to bootstrap BCPL, was to simulate the new HW (as above), and get a lot of the new system running before even the new HW got going.

What are the best books about the history of computers?

This is a very good question! I’m in the UK for a while right now and don’t have access to my home library, so I’ll add to this answer incrementally as I think of more items. However, I also invite all readers of this to provide some of their favorite history sources.

I will confess that I don’t know of a great “master comprehensive” book for the whole field. We should really try to find one — I think it would be of the most use for the Quora audience.

——- added June 2022 ——-

I should have included some old classics about old machines e.g.

“Giant Brains” by Ed Berkely

“Faster Than Thought” — B.V. Bowden

“Computing in the 20th century” — Nick Metropolis

“Whirlwind”

“AN/FSQ-7” (Sage)

“LEO”

---

One area to look at is what ACM has made happen in terms of histories.

There are three “History of Programming Languages” books, that cover many of the major ones.

There is a “History of Workstations” book with contributions by many of the major inventors of personal computing.

There is a book of the first 25 Turing Award lectures. The other Turing lectures can be found on the ACM site.

and quite a bit more

I’m not so thrilled about the Computer History Museum time-lines, etc. but they have a lot of stuff. The “Fellows” elected are arguably a more representative list of names for the history of computing since the 50s than the Turing Award. Running down their backgrounds would reveal a lot.

The best book for ARPA-IPTO/Xerox Parc is

“The Dream Machine” by Mitchell Waldrop

Other books I can think of (let’s run down the titles)

Maurice Wilkes’ “memoirs” - includes history of EDSAC

I think there is a book about Konrad Zuse — maybe “the first” modern designer builder

There’s the Atanasoff Berry computer — might have predated Zuse’s

History of Whirlwind

There are a number of pretty good Babbage/Ada histories

More to come … (and let’s have more contributions …!)

How does a program like Microsoft Excel get written from scratch?

One good answer is “find out how Dan Bricklin and Bob Frankston did the original spreadsheet Visicalc on the 8-bit micro Apple II”.

Another approach is that the abstraction of a spreadsheet is simple, so make a prototype using any language that can make instances. This will run very slowly, and will reveal that a lot of what a useful spreadsheet is about is the optimizations for both calculating and for viewing.

While you are at it, try to think about what horrendously got missed going from Visicalc ca 1978–9 to Excel, which was being prototyped ca 1984.

For ideas about spreadsheets you could also look at my Scientific American article “Software” for Sept 1984.

What does "software architecture" mean?

When a term is borrowed from somewhere else — for example: “science”, “engineering”, or “archtecture” — it is usually important to start with what the terms meant in the developed serious disciplines they were taken from. So we should ask “what is modern science?” and try to critique “computer science” partly on that basis. Similarly, we have “software engineering” — we need to look at what real engineers in the civil, electrical, aviation, biotech, etc disciplines actually do, how they think about it, how they go about it.

The exercise we just did will reveal that the two terms above are at the very best -aspirations- and at the worst the Emperor’s New Clothes. I.e. a much too high BS component.

When I look at “architecture” I see a wide range of design activities, that range from the utilitarian to the beautiful, and from sculpture to highly synergistic new non-linear combinations of basic materials to allow new kinds of designs to be thought about, and to then be built.

In “architecture” the “arch” doesn’t refer to “an arch” — it could, but it doesn’t — but to “the top, the best, etc”: it’s a superlative of a kind — like Moriarty being Holmes’ arch-enemy. The “tecture” refers to building.

Below I start to drift away from “layman’s terms”, but I’d like to mention a beautiful book (for everyone) that gives an excellent “layman’s sense”: “Cathedral” by David Macauley. The only area that doesn’t quite have enough narrative — especially for the child readers — is how and why arches work, and just where the forces arise that need something like flying buttresses as a solution (architects have to think about these kinds of things also).

The good news is that computing doesn’t entirely lack examples of great science, great engineering, and great architecture. But good and great examples can be hard to find. Sometimes they are relative to their time, and sometimes they are timeless.

Some of my personal favorites are Bob Barton’s Burroughs B5000 computer (more than a half dozen of the best SW ideas ever in the actual hardware of a higher level computer), John McCarthy’s Lisp (the invention of a new kind of computer stuff (better bricks are also part of architecture)), Butler Lampson’s Project Genie OS, Dan Ingalls’ Smalltalks, Gerry Popek’s Locus OS, etc.

And of course, the Internet (and its earlier relatives, the ARPAnet, the Ethernet, and the AlohaNet). But not the web or the web browser.

I’ve gone a bit beyond “buildings” in the above. So one question we could ask is “what do we have to do to grow a beautiful city?” And “what do we have to do to keep it beautiful?” And “can we grow it larger and keep it beautiful?”

These are good questions also with respect for SW, because sometimes a beautiful system gets done, gets popular, and then gets quite messed up in the process.

What is the best way to promote a non-traditional Smalltalk programming language implementation?

I don’t know, I’ve never tried to promote any kind of programming language. Certainly, the things that have attracted me to various programming languages have mainly been compelling examples. I think this is the place to start.

What did Alan Kay mean by, "Lisp is the greatest single programming language ever designed"?

First, let me clear up a few misconceptions from the previous answers. One of them said “Try writing an operating system with Lisp”, as though this would be somehow harder. In fact, one of the nicest operating systems ever done was on “The Lisp Machines” (in Zeta-Lisp), the hardware and software following the lead of “The Parc Machines” and Smalltalk — and we in turn had been very influenced by the Lisp model of programming and implementation. (These operating systems in both Smalltalk and Lisp were both better (claim) and easier to write (simpler to demonstrate) than the standard ones of today.)

Another interesting answer assumed that “the test of time” is somehow a cosmic optimization. But as every biologist knows, Darwinian processes “find fits” to an environment, and if the environment is lacking, then the fits will be lacking. Similarly, if most computer people lack understanding and knowledge, then what they will select will also be lacking. There is abundant evidence today that this is just what has happened.

But neither of these has anything to do with my praise of Lisp (and I did explain what I meant in more detail in “The Early History of Smalltalk”).

To start with an analogy, let’s notice that a person who has learned calculus fluently can in many areas out-think the greatest geniuses in history. Scientists after Newton were qualitatively more able than before, etc. My slogan for this is “Point of view is worth 80 IQ points” (you can use “context” or “perspective” etc.). A poor one might subtract 80 IQ points! (See above). A new more powerful one makes some thinking possible that was too difficult before.

One of our many problems with thinking is “cognitive load”: the number of things we can pay attention to at once. The cliche is 7±2, but for many things it is even less. We make progress by making those few things be more powerful.

This is one of the reasons mathematicians like compact notation. The downside is the extra layers of abstraction and new cryptic things to learn — this is the practice part of violin playing — but once you can do this, what you can think about at once has been vastly magnified. There were 20 Maxwell’s Equations in their original form (in terms of partial differentials and cartesian coordinates). Today the four equations we can think about all at once are primarily due to their reformulation by Heaviside to emphasize what is really important about them (and what is likely to be problematic — e.g. the electric and magnetic fields should probably be symmetric with respect to movement, etc).

Modern science is about experiencing phenomena and devising models whose relationships with the phenomena can be “negotiated”. The “negotiation” is necessary because what’s inside our heads, and our representations systems etc have no necessary connection to “what’s out there?”.

Taking this point of view, we can see there can be a “bridge science” and “bridge scientists” because engineers build bridges and these furnish phenomena for scientists to make models of.

Similarly, there can be a “computer science” and “computer scientists” because engineers build hardware and software and these furnish phenomena for scientists to make models of. (In fact, this was a large part of what was meant by “computer science” in the early 60s — and it was an aspiration — still is — not an accomplished fact).

The story behind Lisp is fun (you can read John McCarthy’s account in the first History of Programming Languages). One of the motivations was that he wanted something like “Mathematical Physics” — he called it a “Mathematical Theory of Computation”. Another was that he needed a very general kind of language to make a user interface AI — called “The Advice Taker” — that he had thought up in the late 50s.

He could program — most programs were then in machine code, Fortran existed, and there was a language that had linked lists.

John made something that could do what any programming language could do (relatively easy), but did it in such a way so that it could express the essence of what it was about (this was the math part or the meta part or the modern Maxwell’s Equations part, however you might like to think of it). He partly did this — he says — to show that this way to do things was “neater than a Turing Machine”.

Another observation about this is that the “slope” from the simplest machine structures to the highest level language was the steepest ever — meaning that the journey from recognizable hardware to cosmic expression is a rocket jump!

As is often the case — especially in engineering — a great scientific model is often superior to what exists, and can lead to much better artifacts. This was certainly true here. Steve Russell (later famous for being the main inventor and programmer of “SpaceWar”) looked at what John had done, and said: “That’s a program. If I coded it up we’d have a running version”. As John remarked: “He did, and we did”!

The result was “unlimited programming in an eyeful” (the bottom half of page 13 in the Lisp 1.5 manual). The key was not so much “Lisp” but the kinds of thinking that this kind of representational approach allowed and opened up regarding all kinds of programming language schemes.

A fun thing about it this is that once you’ve grokked it, you can think right away of better programming languages than Lisp, and you can think right away of better ways to write the meta descriptions than John did. This is the “POV = 80 IQ points” part.

But this is like saying that once you’ve seen Newton, it becomes possible to do electrodynamics and relativity. The biggest feat in science was Newton’s!

This is why “Lisp is the greatest!”

Is there a programming language that's effectively a successor to Smalltalk?

Well, let's see. The first usable Smalltalk was designed and implemented by the end of 1972. That makes it 45 years old, and the main ideas about OOP I contributed go back to the end of 1966 (that makes this particular line of thought — dynamic programmable OOP — 51 years old). If Moore's law represents a doubling every 18 months to 2 years (depending on what you look at), then the changes in combined scales since then are probably at least a factor of 100,000,000.

The big ideas have fared better than the implementations we did back then, but quite a few of the more important future looking ones — for example, having objects be able to negotiate their interoperability with other objects — were barely thought about, and not worked on.

In my mind, a real successor to Smalltalk is really needed, and it needs to be qualitatively a successor, not just a better version of the old ideas.

Scaling does need to be done much better to deal with today and tomorrow. Another dimension that is very important is expressibility. Smalltalk at Xerox PARC was able to do its "OS", its graphics system and UI, its (live) development system, and a number of "personal computing abilities" involving media, high quality fonts, etc. in about 10,000 lines of always operating code (and given that this is a moving target, we could just as easily say 20,000 lines of code to make the same point).

So, we should think about something important and large in functionality that a real successor of today or the near future could bring to life in 10,000 to 20,000 lines of code (vs e.g. code that is 1000 or more times larger).

Certainly — harking back to Sketchpad — we should call for a deep set of abilities to go automatically from the "whats" to the "hows" via a constellation of integrated problem solvers under the hood. We really want to do a lot of the programming of the future in terms of "runnable requirements", etc.

One big thing that we talked about and some interesting experiments made, and more recently a general facility made (the “Worlds” system on the Viewpoints Writings page), is the “simulation of time” for many purposes: “possible worlds reasoning”, parallel transactional functional relationships from one world “level” to the next, generalized UNDO, etc. This should definitely be done on the next serious effort for a new programming language.

Similarly, we should call for a much better approach to how software development is done. Smalltalk pioneered a lot of IDE ideas (and we got a lot of ideas from several of the previous Lisps). But to be done seriously, software has to take engineering seriously, and should now look to see what the integrated CAD -> SIM -> FAB tools are like in the real engineering disciplines (civil, electrical, bio, aero, etc.)

All in all, I may be missing something out in the hinterlands, but what I see when I look out at the programming landscape is rather tiny little incremental improvements (with occasionally some really bad regressions), but I don't see serious efforts to invent "programming for the 21st century").

We can blame bad funders for some of this. But I wonder what would happen if good funding showed up. How many computer people are thinking about "what is actually needed" rather than "what would be a little better"?

What does Alan Kay use in lieu of PowerPoint?

I’ve given a lot of talks using Etoys (in Squeak), a programming environment we made for children (starting at Disney in the late 90s) that also had decent media design. The current day Scratch is a subset descendant of Etoys, but without many of the useful media abilities. Another interesting system that I used for many years was “Frank”, the comprehensive system we built for the STEPS project in a few 10 thousand lines of code.

Many of the talks I’ve given both use and require comprehensive end-user programming — and PPT does not supply this (despite having both the Smalltalks and Hypercard preceding it). So it’s both annoying and maddening. A good talk of a sort can be given in PPT if one omits some of the terrible conventions — like bullet points, lots of text, etc — and concentrates on having the visuals complement the words rather than being a cheat sheet.

My favorite recent talk using programmable media was the one a few years ago for the Ted Nelson tribute. For “Christmas fun” in 2014, several of us worked for a few weeks to resurrect Smalltalk-78 (the vintage of Smalltalk that Steve Jobs saw in the famous visits to Parc in 1979) from a file that had been rescued from a disk pack that Xerox had thrown away. This was the Smalltalk that had been made for the portable “Notetaker” computers we had made; it was nicely sculpted by Dan Ingalls for this purpose; and was completely written in itself, including the “OS”, development tools, etc — everything — in about 10,000 lines of code plus about 6k bytes of machine code for the Notetaker CPUs.

This was my favorite of the “engineered” Smalltalks (that came after -72), and was great fun to program in. I was very happy to do the media presentation for the Ted Nelson talk (on YouTube) entirely in it.

https://www.youtube.com/watch?v=AnrlSqtpOkw&t=135s

Despite the one-bit-per-pixel graphics, I’ve thought about doing more talks using this. The biggest difference between then and now is that today there is much more room for images than back then: we had them but the memories were too small to use more than a few.

As a beginner, what are the best ways to approach Computer Science?

If you are just looking to get a job in computing, don’t bother to read further.

First, there are several things to get clear with regard to any field.

What is the best conception of what the field is about?

What is the best above threshold knowledge to date?

How incomplete is the field; how much will it need to change?

When I’ve personally asked most people for a definition of “Computer Science” I’ve gotten back an engineering definition, not one of a science. Part of what is wrong with “CS” these days is both a confusion about what it is, and that the current notion is a weak idea.

The good news is that there is some above threshold knowledge. The sobering news is that it is hard to find in any undergrad curriculum. So it must be ferreted out these days.

Finally, most of the field is yet to be invented — or even discovered. So the strategies for becoming a Computer Scientist have to include learning how to invent, learning how to change, learning how to criticize, learning how to convince.

Most people in the liberal arts would not confuse learning a language like English and learning to read and write well in it, with the main contents of the liberal arts — which, in a nutshell, are ideas. The liberal arts spans many fields, including mathematics, science, philosophy, history, literature, etc. and we want to be fluent about reading and writing and understanding these ideas.

So programming is a good thing to learn, but it is most certainly not at the center of the field of computer science! When the first ever Turing Award winner says something, we should pay attention, and Al Perlis — who was one of if not the definer of the term said: “Computer Science is the Science of Processes”, and he meant all processes, not just those that run on digital computers. Most of the processes in the world to study are more complex than most of the ones we’ve been able to build on computers, so just looking at computers is looking at the least stuff you can look at.

Another way to figure out what you should be doing, is to realize that CS is also a “blank canvas” to “something” kind of field — it produces artifacts that can be studied scientifically, just as the building of bridges has led to “bridge science”. Gravitational and weather forces keep bridge designers honest, but analogous forces are extremely weak in computing, and this allows people who don’t know much to get away with murder (rather like the fashion field, where dumb designs can even become fads, and failures are not fatal). Getting into a “learned science of designs that happen to be dumb” is not the best path!

We (my research community) found that having an undergraduate degree in something really difficult and developed helped a lot (a) as a bullshit detector for BS in computing (of which there is a lot), (b) as a guide to what a real “Computer Science” field should be and could be like, and (c) to provide a lot of skills on the one hand and heuristic lore on the other for how to make real progress. Having a parallel interest in the arts, especially theater, provides considerable additional perspective on what UI design is really about, and also in the large, what computing should be about.

So I always advise young people -not- to major in computing as an undergraduate (there’s not enough “there there”) but instead to actually learn as much about the world and how humans work as possible. In grad school you are supposed to advance the state of the art (or at least this used to be the case), so you are in a better position with regard to an unformed field.

Meanwhile, since CS is about systems, you need to start learning about systems, and not to restrict yourself just those on computers. Take a look at biology, cities, economics, etc just to get started.

Finally, at some age you need to take responsibility for your own education. This should happen in high school or earlier (but is rare). However, you should not try to get through college via some degree factory’s notion of “certification” without having formed a personal basis for criticizing and choosing. Find out what real education is, and start working your way through it.

What are the backgrounds of engineer-scientists and scientist-engineers?

I don’t know how to give a reasonable answer (because there is quite a spread in personalities, skills, and knowledge across the very large fields). But, I can give a “non-reasonable” attempt at this. If you take “STEM”, realize that the “T” for “Technology” just means whatever humans make — and so is too large and diffuse a category — and look at the historical order of the fields, we can use “T” for “Tinkering”, then “Engineering”, then “Math”, then “Science” to get TEMS.

Giving the past its due, it is quite reasonable to date what we think of as Science today to the turn of the 17th century, and especially the amazing “Principia Mathematica” of Newton was shockingly powerful both in method and philosophically. It also exerted a huge magnetic field on TEM to pull the Venn diagram of the 4 pursuits much closer together. A lot of the big hitters of the last 150–200 years have happened in the sweet spot intersection.

Again, I don’t know how to generalize here, but I’ve had the pleasure of working in that sweet spot for quite a few years, so I can at least say something from my own experience.

The people I worked with all had gotten quite good at pretty much the same things, with regard to Math, Engineering, and Science — basically, top levels of knowledge. To me, the main differences could be put down to temperament and personality type — and what they wanted to achieve, what it meant to have done good work. The result of this was that everyone had a lot to talk about, and everyone had the background to understand the different points of view and goals. Usually the different types could work together, and there the differences created magic.

If computer science looked more like engineering, what would "software simulators" do?

Let’s try to use as parallel analogies as possible to “science” (what physics, chemistry, biology, etc.) do, and to “engineering” (what civil, electrical, mechanical, bio-, etc. engineering do).

And let’s also look for interesting overlaps. For example, making bridges is “very engineering”, but once a bridge is made it manifests “phenomena” and scientists can study it to make models that manifest like the phenomena. Herb Simon (both Turing Award and Nobel Prize) wrote a terrific book called “The Sciences of the Artificial” that is worth reading.

Nowadays it is hard to do “good bridge” without both deep science and deep engineering — and some deep computing, because we mostly definitely want to simulate our understanding/models of bridges both for edification, and also for both success and safety. And from these better models we’ll get better insights about how to design and build better bridges.

To add an historical note here, an early act of real computer science happened when John McCarthy needed a programming language to program his “Advice Taker” UI in the late 50s, and he wanted it to be mathematical in form. To do this he made a model of computing and wrote it up as a start on “A Mathematical Theory of Computation”. This was “very science”! Steve Russell said “That’s a program. If I programmed it we’d have a new language”. As McCarthy said, “He did, and we did!”. The result was Lisp — and it was also a “better bridge” than what was studied: the model suggested much better ways to think about programming than how FORTRAN went about it.

Let’s suppose we invent a programming language and we’d like to evaluate our design as early in the process as possible. One way to express the design is as a model — it’s a kind of a “theory”. Unlike regular math there are usually too many degrees of freedom for classical proof techniques, but we can get a lot out of being able to run our theory.

At this point, let’s pause to think about what we are doing. On the one hand we are in something very like the CAD->SIM->FAB process that good engineering disciplines have (and let’s pretend we have similar powerful tools for our domain). We can see that the “FAB” part of the process in making a language useful involves a lot of various kinds of optimization to fit existing machines, other systems, etc, (a lot of stuff). This is very similar to what has to be done in the physical world for FAB: lots of important nitty gritty stuff. Now what could “SIM” be? How about getting our CAD model language to run as directly as possible on a supercomputer so that we can write serious programs (including real-time ones) to see how we like our language. (Note that if we’ve defined the semantics of our language in a good way in the CAD part, then our tools should be able to more or less directly run and simulate the model.)

On the other hand, we are very interested in the “computer science” aspects of our design. If we like Al Perlis’ (the first Turing Award winner) definition: “computer science is the science of processes” (he meant all processes!), then our CAD->SIM will allow us to deeply probe the process phenomena of our design (and we really need to!). For example, being able to program complex parallel systems is one of the keys to making progress in the future, and the problems are far from being just engineering type problems: they are also deeply scientific in nature.

In short, we aren’t very good at computing, thinking about computing, or doing computing. We need to get better designs vetted and improved much much quicker. To do this we need to have an intermediate “simulation” phase and the tools to go with it on every worthwhile project we want to do. And having these will suggest many projects we should be doing but just can’t see.

What does Alan Kay think about Gerry Sussman's recent talk at the Strange Loop conference titled "We Really Don't Know How to Compute!"?

I like to read rather than watch videos. I’ve always been a big fan of Gerry since he was a grad student at MIT, and have generally agreed with him. If he is complaining about current programming and programming languages then he is probably right and I’d probably agree with him.

I’m motivated enough to get this transcribed and read it**. Sound odd? I can remember pretty much everything I read, and can read a lot faster than I can “watch” — I feel as if someone is subtracting brain cells from me when I have to watch something (except maybe someone good playing music). (So I’ll get back to this in a while.)

Bottom line is that Sussman is interesting. An interesting person is someone who is interesting whether you agree with them or not. So it’s always worth finding out his opinion.

---

** Thanks to Hrvoje Krot for transcribing this talk (lyssphacker/talks)!

When I found a few days ago this had happened, I read the talk, and now will try to answer the question more directly.

I think this is a terrific talk by a guy who has never been afraid to have — and let us know — his own opinion (and most importantly, willing to do the work to substantiate his opinions).

The heuristics he gives are very worthwhile to internalize. His arguments about and against this, that, and many other things are all worth pondering deeply.

I think he would agree that one of the properties that computing shares with more traditional math, is that the parts of it that are understood can be understood completely. The nature of the descriptions is to be about themselves in a way that can exist entirely in our minds without worrying about the “noise of physics” etc. In fact, computing looked at this way is a new branch of mathematics.

This is another way of supporting his quite right assertions that computing should not be conducted anything like a religion, with creeds to be dogmatically believed in, and followers of creeds, and inter-creed disputes and warfare.

Instead he argues for “really thinking things through” and this “from scratch” as much as possible. Those who don’t are continuing to make a major mess of things and issues that need to be solved rather than patched over.

The “bottom line” above is substantiated. I don’t agree with every notion in this talk, but I’m very glad to have heard all of them.

How do you know when you have the chops to build your own tools?

I don’t know a good general answer. The ARPA-Parc gang got some sense of it by building tools, and sometimes undergoing what Butler Lampson once referred to as “bitter experience”. In those days of new computer architectures and non-standardized operating systems (or no operating systems) there was more tool building as a matter of course, and a lot of dives down ratholes never to return.

This led in the field to a bit of a separation into a large group that did not want to build tools anymore, and a smaller group that was motivated to find better ways to build tools, including making tools to build tools.

Parc was a bit of a sweet spot in time because quite a few heroic things had been done in the larger community in the near years before: for example, we were all impressed that the Engelbart group had adapted and made a “compiler-compiler” to make their own systems programming language for the SDS-940 (a tiny slow machine) and did it well enough to make it really pay off. Many of the people at Parc — like Butler, Peter Deutsch, etc. — had done a number of tough systems, including operating systems and programming languages. Some of us liked how “nice” and “tidy” Lisp was (if only something like it could be run fast — maybe on a custom machine?).

The other thing that helped was that the number of computer people at Parc in the early days was just a few dozen, and there were a lot of things we wanted to do. The predominantly scientific backgrounds of the researchers responded to “too big and too complex” with “we need better maths!”. (This is a bit hard to explain these days — but take a look at how Maxwell’s Equations could fit on a T-shirt from what they were in Maxwell’s papers.)

Translated: make super high leverage to deal with the problems. One of the solutions was whole-system design, so that the custom hardware optimized the software rather than having to do SW to work around the hardware, and this allowed “Lisp and beyond” software solutions to stay simple and efficient enough to have real power.

The simple key of “doing tools” is that they mustn’t take over the goals, so they have to be done quickly and well — for example, Chuck Thacker did the whole first Alto in just a little over 3 months; Dan Ingalls got the first version of Smalltalk going in a month, from some handwritten kernel code that took me about two weeks to come up with.

This is where the “chops” come in (and part of “chops” — when you are playing jazz or classical — is that first and foremost you have to produce “real music”, and this means you have to stay within your current abilities and concentrate on the art). You “get chops” by practicing for some years before you have to perform …

I needed to add that having chops still doesn’t mean every endeavor is going to work … we occasionally at Parc would get into “little troubles” from biting off a bit more than we could chew that sometimes were pernicious. The recoveries from a couple of these later made for good stories around the campfire.

What did Alan Kay mean by, "I made up the term object-oriented, and I can tell you I did not have C++ in mind."?

It’s hard to praise too highly the programming languages that are the bridge from one way of looking at programming to much better ways of looking at programming. The two greatest such in the 60s were Lisp and Simula. Perhaps the greatest single conception of a software system of the 60s was Sketchpad.

As explained elsewhere on Quora, and in “The Early History of Smalltalk”, I had chance encounters with Sketchpad and Simula in my first week of grad school in late 66, that shocked me into a realization about “computers as basic and universal units” via the connections and parallels with other like things, such as biological structures, computers on networks, processes in time-sharing systems, general systems of parts intercommunicating, and so forth. I started to think about dynamic languages to make such processes, and how the processes could be made efficient and parsimonious enough to be universal.

Someone asked me what I was doing, and without thinking, I said “object-oriented programming”. (A very bad choice as it turned out, for many reasons.)

In the first years of the 70s at Xerox Parc with the great help of a terrific research group, and especially Dan Ingalls, we were gradually able to solve the software engineering problems to make this “systems approach” to programming practical (and especially on the emulation architecture hardware of Chuck Thacker). The starting place was how Lisp systems were built and made efficient.

This led to real power of expression, and a number of breakthroughs at every level, from the user interface to the metal.

This attracted great attention, and 10 years later in the 80s “object-oriented languages” started to appear. The only ones that had some of the same flavor as Smalltalk were several versions of Lisp. There were several “object Pascals”, even an “Object COBOL”!, and of course, C++.

It’s important to realize that C++ was part of a chain of ideas starting around 1979 by Bjarne Stroustrup who was also shocked into action from encountering Simula. He was not trying to “steal from Smalltalk” in any way. Here’s what he says in his history paper:

C++ was designed to provide Simula’s facilities for program organization together with C’s efficiency and flexibility for systems programming. … The goal was modest in that it did not involve innovation, and preposterous in both its time scale and its Draconian demands on efficiency and flexibility.

Elsewhere, he takes pains to say that he’s “not trying to do what Smalltalk at Xerox Parc did”. He was essentially trying to do with C what Simula did with Algol.

His approach was via “Abstract Data Types” (which the co-inventor of Simula — Dahl — also liked), and this is the way “classes” in C++ are generally used. And, similar to Simula being a preprocessor to Algol, C++ was a preprocessor to C: the “classes” were program code structuring conventions but didn’t show up as objects during runtime.

And for a variety of reasons, some of them not good, some reasonable, C++ got very popular.

It became part of the “colonization” of the term “object-oriented”. By the end of the 80s we could not explain the software at Parc as being “object-oriented” because the term had become co-opted. This kind of thing happens a lot in many areas. So it was understandable, and Bjarne certainly wasn’t to blame. The term became a kind of fad.

But it was — and is — quite annoying. I had to start calling the Parc stuff “real objects”, etc.

The push-back on complaints was the same one for pop changes in language usage and meaning i.e. “the people determine what a word or term means”.

But the -ideas- of OOP that worked so well for us at Parc came from many places, and we liked Goethe’s quote: “We should all share in the excitement of discovery without vain attempts to claim priority”. We did what we did, and we called it what we called it.

Hence the quote, which is not so much about C++ per se but about the term that we had been using to label a particular approach to program language and systems design.

—— Note: Quora tends to bury comments, so I’m copying an interesting one here along with my reply:

Abhinav Sharma: Thanks! I’m curious if you considered rebranding “real OOP” once the term had gotten colonized? I realize it must feel sad to have the term you coined get misconstrued but do you think a rebranding would be a good way for the ideas themselves to live on?

AK: I think the ideas are living on for historical purposes for those who are interested. And something can be learned about human nature by comparing implemented artifacts from the past with what got accepted and why.

On the other hand, the 70s idea of “real OOP” was hugely powerful back then, but what was implemented was far from a complete set of ideas, especially with regard to scaling, networking, etc. Dave Reed’s MIT thesis happened in 1978 and this dovetailed with ideas I thought important also, but funding and other circumstances delayed implementing his version of “real objects” until the early 2000s (this was Croquet).

How dynamic objects intertwined with ontologies and inference was explored by Goldstein and Bobrow at Parc. Their 4 papers on PIE and their implementation were the best extensions ever done to Smalltalk, and two of the ideas transcended the Smalltalk structure and deserved to be the start of a new language, and perhaps have a new term coined for it.

One way to look at this is that computers have enormous degrees of freedom and capacity, and this makes it very difficult to come up with great tools to program them. This is a mismatch with how humans learn and particularly with how we learn and do skill based activities. The latter tend to be very conservative, whereas the demand of computing is that we have to shed our skins every few years (and we don’t).

I don’t think that “real OOP” as we thought of it then, is the way to go in the future (and didn’t then). Consider Sketchpad, and that it is programmed in terms of constraints that the system solves to guarantee a very close fit between what was desired and what is manifested. This is an early glimpse into “requirements-based programming”. It has something like objects — hard to get away from the main idea — but is “relational” rather than message-based (the messages are implicit, etc.) Sketchpad was a tour de force in many ways, including its solvers. But they didn’t go far enough to handle general requirements. Today I think this is doable via a half dozen new techniques plus enormously larger machine capacities.

Stuff like this is what we should be working on!

I don't want be a specialist. How can I be a great generalist in computer science?

Every once in a while I’m asked “What does it mean to be a Computer Scientist?”. My answer is “I don’t know exactly, but take a look at Butler Lampson. He is certainly the quintessential Computer Scientist if ever there was one”.

We find in Butler someone who covers in knowledge and skills a very wide range in computing and software engineering, who has not just designed/invented but also built operating systems, programming languages, networking and internetworking systems, desktop media, specification languages, atomic transaction protocols, and much more, and hardware architectures and detailed designs for making hardware ... And more. A partial summary is here.

He must have been the easiest Turing Award winner to choose in 1992 (the top recognition in computing), and is one of the few ever to have also been awarded the top engineering award, the IEEE John von Neumann Award. He was inducted into both the National Academy of Science and the National Academy of Engineering. Etc.

Once in an ARPAnet design meeting in the 60s, Butler was supposed to give a one hour talk at the end of the day, but the meeting dragged on. When it was his turn, he looked at his watch and said “I’ve got a plane I have to catch”, and proceeded to give the -entire- talk in 20 minutes, in perfect understandable English at great speed! It was shockingly impressive even from him. The person next to me said “It’s a great privilege to just know someone like Butler”. That talk — on a capability based operating system design — covered the content in a paper you can read.

Butler is also one of the clearest writers in our field. Here is a partial list of his papers over the years. And here is the first page of his website with a few tips on stuff to look at that has been most popular with readers.

Certainly in the top few of the most impressive people I’ve ever met, and especially to have had the fun of working with.

Now to try to answer your question. I think that — besides Butler’s obvious brilliance — there are a few keys to his range and depth, and many of these revolve around -systems and processes- rather than hardware and software, or programming languages and operating systems. Dealing with whole systems was in the zeitgeist of the ARPA community back then, and it found Butler while on his way — after an undergrad degree in Physics from Harvard — to grad school in Physics at Berkeley. As he puts it “I walked in the wrong door and found people trying to debug code for a new computer. And never found the doorway out”.

He wound up being the principal designer of the operating system of this computer — Project Genie — which was a breakthrough in practical and affordable time-sharing, and in its commercial version — the SDS 940 — both was the computer that Engelbart’s “mother of all demos” was done on, and which also launched the commercial time-sharing industry.

Butler was and is interested in -anything- that has dynamic relationships — systems and processes — (and note the analogies with what is studied in physics). I think looking at the world this way — all systems and processes (including physical and biological) — will round you into a “generalist in computer scientist” who can be a vital contributor not just a dabbler. There are only a few people around like Butler, but how he has gone about his art can be and is an inspiration for the rest of us.

How could public libraries disrupt themselves?

Libraries made their way in the US partly because they provided something not easily found otherwise. Still, it’s sobering to realize that about half of all the libraries in the US by about 1920 were there because they were funded by a single philanthropist (Andrew Carnegie). He was an immigrant child laborer from Scotland and started his climb to immense wealth in his early teens because his first boss opened his house library to his laborers on Saturdays.

Most “Carnegie Libraries” had open stacks (apparently the idea originated with these grants), and almost always a special room for children, and another special room where reading was taught in the evenings.

In the ARPA-Parc work in the 60s and 70s on personal computing and pervasive networking, two of the most compelling images for the future were to take ideas like the Carnegie Libraries (with their special children’s rooms and rooms where you could learn to read) and the Whole Earth Catalog (with its perspectives and interlinked “systems of systems” of ideas) to the next levels that computers would make possible — not just to imitate them and make them more convenient, but to extend what literacy, reading, writing, understanding could be.

Today, libraries are on the decline, along with bookstores. -Something- is more convenient for some people, and whatever this is has started to displace the old resources and venues. As with many things in the age of electricity and electronics, the displacement is not being done by replacement with media of similar value, but by subsets, sometimes disastrous subsetting. For example, especially for good readers, none of the screens available now are as good to read from.

We can find out about the existence of a lot of small things, but generally speaking we don’t wind up with whole books when we use a search engine for something. So in many cases — as with replacing newspapers with TV — we lose depth and width. As media scholars have pointed out, this started happening in important ways after the telegraph ca 1845 made currency of information more important than content.

Similarly, going to smaller screens with smaller viewing angles (regardless of the pixel density) is a subtle barrier for extensive reading. Going to non-keyboards is a subtle barrier for writing real prose (rather than transcribing short oral modes of thought).

Another — I think disastrous change — is from what one got from the open stack approach: you went looking for one thing, and found many things you didn’t know to look for (and even about the subject you were interested in). So it is surprising (from the standpoint of reason and idealism) that Amazon has done a pretty terrible job of providing UIs that make serendipity more possible (they haven’t done zero, but I wonder if they do understand the larger idea here). This is a shame because I love that Amazon has such a large catalog, and especially a second hand catalog. If you know what you need and want, there’s a good chance you can get it.

On the other hand, Amazon at present doesn’t have a “free” library idea — that is a library supported by taxes for those who can’t afford to buy a book every other day. Libraries are quite expensive as individual transactions. (Some years ago the Dartmouth library in a study said $55 per book taken out. This cost is subsumed for the public benefit to make a “free” library. This got Dartmouth to do the same thing for the entire town with the university’s time sharing system! You could find a teletype at a gas station with the operator doing his accounts in a program he had written in Dartmouth BASIC!)

And it’s very surprising to me that there doesn’t yet seem to be a free system online that will teach people to read. If semi-AIs are to have any real value at all, this kind of service is what is needed from them.

To try to answer Eddie’s question, I think we will see further erosion of the physical libraries unless something unique and important about the physical structures can be realized and enhanced. How about the teaching and children’s functions of the original Carnegie Libraries? Both of these are enhanced by having people learning and doing together. How about providing much better ways to browse books? And to “take books out”? All of the functions suggested in this paragraph are being done poorly by the current consumer companies: there is room for something of higher value and supported by the local communities and philanthropists.

One of many of Carnegie’s insights is that libraries are not just for access to information, but that they create important social and amplifying roles in their local communities. Better ideas along these lines will make a big difference.

And could online as well. And for the same reasons via the poor subsetting that is now being accepted as replacements. The progression has been to redefine “normal” downward, and this makes it difficult for unsophisticated people to see the decline (things look somehow better — but they are confusing convenience for content). Today, it’s a bit hard to imagine a force for good to make this happen. We don’t seem to have Andrew Carnegie types today (we do have some large billionaires who do some philanthropy, but so far there is not much of a comparison). A reasonable political system (the oxymoron of the century) would be focusing immense resources both on the early grades of school and on community learning centers, especially in the evening (how about adding libraries to community colleges, etc.?)

Finally, I think I’d take a shot at dealing with the homeless (whom are often the most numerous in libraries these days). Maybe this is too big a task to try to include therapy and hope as part of a public institution’s functions, but on the other hand, why not? The Greek lawgiver — Solon — was reputed to have said “You can judge a society by how well it treats its least fortunate citizens”. I think our society has a long way to go in this regard.

What was the total economic impact of research at Xerox parc in the 1970s?

Hard to nail down and substantiate (and it would be good and interesting to do so). Some years ago I saw an estimate — I think from a university — that said “in excess of $30 Trillion dollars, perhaps as much as $35T”. I have no idea whether this was really drawn from comprehensive data (and how can you draw lines between (say) the impact of integrated circuits and that of personal computing, the GUI, the Internet, OOP, laser printing, etc.?).

However, figuring an average of about $1T/year contribution from ARPA/Parc (impossible for me to separate these) within the much larger whole sector since 1980 — perhaps a little less — seems pretty reasonable to me (that’s not a substantiated estimate, just a supposition).

Another way to look at it could be to assign value based on importance. Integrated circuits are right up there. A very large part of the market completely required the GUI to be invented. A substantially overlapping part of the market completely required the Internet to be invented. ARPA/Parc was the main factor in the inventions of the last two.

One part of this I do know with more foundation is that Xerox paid for all of Parc many times over from the profits of just one of the inventions: the laser printer by Gary Starkweather. I don’t know exactly how much the four labs of Parc cost in the decade of the 70s, but it was probably something on the order of $200 million in today’s dollars. So even one billion of profit is a factor of 5 ROI, and the actual profits were far larger.

What was Xerox PARC's mission statement?

I don’t recall one — I’m pretty sure — not totally — that there wasn’t an “official” one. But the intent of Jack Goldman — the Chief Scientist of Xerox, a physicist, who decided to make PARC — was to devote a small percentage of Xerox’s R&D resources to “non-directed” research in the areas of Physics and Chemistry, Optics, and Computing.

The understanding was that the scientists would very seriously take Xerox’s present and possible and desirable futures into account when deciding what to work on. The previous sentence is an informal “mission statement”.

I think this worked very well. It’s hard for me to recall any project done in any of the labs that wasn’t sincerely thought of by the researchers as being for the benefit of the company — and I think most of the choices made in all the labs really were of potential benefit to the company. I also think that it was the researchers themselves who decided what to work on within this context.

It was certainly the case that “the company” (meaning the corporate executives) really had a tough time understanding many of the results. The disconnect with the results from the computer labs are famous. But let me mention the wonderful flat screen display technology done with particle migration (much later used in the eInk displays on many eBook readers) — one of whose main researchers was Anne Chiang. Xerox had many of the key patents on many kinds of flat screen technology (including LCDs and amorphous semiconductors). But “the company” just wasn’t willing to invest in taking their special inventions to manufacturing and marketplace.

I needed to mention here that despite the disconnects, Xerox made billions from one of the Parc inventions — the laser printer by Gary Starkweather — and paid for all of Parc many times over. This is the biggest point of this story: that the multiples from new inventions are often so high that doing risky research in many areas with top people and modest resources is a not to be missed strategy.

If you had to teach ‘Computer Science 101’, how would you begin your first lecture?

Let’s think about this question for a bit. Some years ago I was invited by “CS Departments” to give talks at a number of universities. Almost accidentally I asked the first audience of undergrads, grad students and professors for their definition of “Computer Science”. Not one could give other than an engineering definition. I did this at each new place on the tour with identical results. Other questions I asked included “Who was Doug Engelbart?” A few said “didn’t he have something to do with the mouse?” (this was extremely upsetting since my research community had put in a lot of effort to make it possible to answer such questions with two or three clicks of the mouse that Engelbart indeed had something to do with). The problem was partly lack of curiosity, partly quite narrow internal goals not having to do with learning, partly not having the faintest idea of what a “science” might be about, etc.

I’ve been an adjunct faculty member of the UCLA CS Department for a few decades (basically a full professor but I don’t have to go to faculty meetings). I occasionally teach classes, and occasionally with freshmen. Over the years, the already low curiosity about computing has dropped considerably (and it has also gotten a lot more popular as it is seen as a route to a well paying job by learning to program and getting certified by a “top 10” school). Accordingly no student has been heard to complain that the first language at UCLA is C++!

I think we are faced with a situation where both the meanings of “computer” and of “science” have been done in by massive weak conceptions to create a new term — a kind of a designer jeans label — that sounds good but is quite hollow. A related term that has similarly been done in is “software engineering”, again by not taking the largest ideas of both “software” and of “engineering” and combining them (the large senses of both was the intent in the 60s when the terms were originally coined).

Since one of the reasons to actually go to a university is to go beyond mere vocational training, and instead to contact deeper ideas, I think it is quite reasonable for any good introduction to a field to try — through examples when possible — to have the learners touch the real stuff and to start to get a feeling about what is interesting and important and core to the field.

First graders delight when you show them that a ruler on top of another ruler can make an “adding machine” where they can outperform 5th graders adding fractions. They will then happily contribute to making even better adding machines. They have touched a real computer — a physical or mental tool to help us think. They have learned a really powerful way to represent numbers — more powerful than their schools will teach! They’ve been able to connect their commonsense idea of “addition” as “accumulation” to something analogous with powerful new properties, They have programmed it to make it solve a large variety of addition problems. They have extended it. Etc. It is not a digital computer. It is not a stored program computer. But it is the essence of a computer. Just as the Antikythera mechanism is the essence of what a computer and computation is all about.

How far and large can we go before things get out of hand and we lose our way in too much abstraction? I’ve always been partial to the characterization of Alan Perlis — the first Turing Award winner and who perhaps coined the term “Computer Science” — who said in the 60s: “Computer Science” is the “science of processes”. All processes. For Quora, let’s not try to pound this further or make it into religious dogma. Let’s just happily use Al Perlis’ idea to help us think better about our field. And especially how we should go about trying to teach it.

Now we need to look at the modern meaning of “Science”, and Perlis was quite firm that this should not be diluted into old meanings (for example, the gathering of knowledge) and uses (such as “Library Science” or even “Social Science”). He meant “science” as trying to understand phenomena by making models/maps that try to exhibit, “track”, and predict the phenomena.

I’ve given several talks on how many of the best maps and models can often fit on a T-Shirt, as we have seen Maxwell’s and other Equations.

An analogy here is that there is a “Science of Bridges’ even though most bridges are made by people. But once a bridge is made, it exhibits phenomena, scientists can study them, models of many kinds can be made of bridges, and large comprehensive and useful “Theories of Bridges” can be formed. The fun of this is that then new bridges can be designed and made (I’ve mentioned before that there is hardly anything as fun as getting scientists and engineers together to tackle things big and important!)

Herb Simon — a Turing Award and Nobel Prize winner — called all of these “Sciences of the Artificial” (and wrote a terrific book with the same name).

Let me give an example here. In the 50s companies and universities were building stored program computers, and starting to program them — and there was a special moment when Fortran appeared in 1956 — not the first higher level language, but arguably the first one engineered well enough to be used in many different areas, including many that had only been done previously in machine language.

All of this produced “phenomena”.

The story of Lisp is more complex than I can tell here, but John McCarthy got interested in trying to find a “Mathematical Theory of Computation”, and he was determined to make both sides of the phrase work as well as possible. The function eval that will interpret a Lisp program into its meaning, just does fit on a T-Shirt!

This makes it tiny compared to a “programming system”. More important, this “theory of computation” was also a much more powerful conception than Fortran! It was a better idea of a bridge!

The tininess of Lisp allows the whole idea of programming at a deep level to be grasped in one eyeful and a few fingers, and to be thought about in ways not possible when looking at enormous artifacts (this is one of the reasons that scientists like their maths to be compact and powerful). The math used here is a new math, because it allows notions of before and after, and this led to the “logic of fluents”, also by McCarthy, that allows both functional relationships and reasoning to be preserved while also allowing state and the passage of time. (This is still not understood today in the rough and tumble world of ad hoc programming.)

Lisp as both a powerful programming language, and a metalanguage that can express its own theory, is an example of real “Computer Science”.

If you learn this, and other things like it, then you will be a much more powerful thinker, and more in charge of your own destiny than if you had just learned to program in Fortran, or its modern equivalents (… you will be on your way to actually learning how to be a computer scientist!).

You will learn a lot more about the special kinds of design that are needed in computing (for example, it is not generally appreciated that computing often really requires one to transcend the computing environment one starts with: one of the special characteristics of stored program computing is that it is not just material for a program, but material for a whole new computer).

Another reason for Al Perlis’ choice of definition is because computing in the large is much more about making systems of many kinds than it is about algorithms, “data structures”, or even programming per se. For example, a computer is a system, computations are systems, the Ethernet and Internet are systems, and most programs should be much better systems than they are (old style programming from the 50s has lingered on to the point that it seems to many that this is what “programming” is and should be — and nothing could be farther than the truth).

The Internet is a good case in point — unlike most programs today, the Internet doesn’t need to be stopped in order to fix or improve it — it is much more like a biological system — by our intention — than what most computer people think of as a computing system. And as such it is much more scalable and robust than almost all software systems done to date. This should really be pondered before weak ideas are given to beginners as representing “computer science”!

So what we need to do in a good first course in computing is to take into account what the students can do in the beginning, and then try to stay inside their “cognitive load” limits to help them get to what’s really important in a way that they can both touch and make it themselves. It is critical to “stay real” here and find ways that are intellectually honest and can fit those just starting to learn. (Please don’t teach bad ideas just because they seem a little easier — a lot of bad ideas are a little easier!)

The students need to start by making something themselves that has most of the important characteristics I’ve discussed here. It needs to be a system of multiple dynamically interacting parts, etc. A good way to assess what language to use is whether it is simple to make something that has thousands of interacting parts! If not, you need to find one. The worst thing that can be done is to get the students on a path to a too weak fluency that is a too limited subset of the big ideas. This just kills them — and we want to grow them not kill them.

How would you interpret Marvin Minsky's idea that “Computer science already has a grammar. What it needs is a literature.”?

The most interesting aspect of Ken’s blog entry (including the comments) is that nowhere do we find an historical reference to this idea. In fact, more than 50 years earlier in the 60s it was a part of many a conversation, and if I recall correctly, a few papers.

I first heard the idea from Bob Barton when in grad school ca 1967, and he said that this was part of the motivation that Don Knuth had for writing his “Art of Programming”, chapters of which were already being passed around. One of Bob’s main questions back then was about “programming languages designed to be read by humans as well as machines”. And this was a main motivation for parts of the design of COBOL in the early 60s. And perhaps more important for our subject here, in the very early and quite beautifully designed interactive language JOSS (mostly by Cliff Shaw).

As Frank Smith has pointed out, a literature starts with ideas worth discussing and writing down; it often partially generates representations and extends existing languages and forms; it leads to new ideas about reading and writing; and finally to new ideas that were not part of the original impetus.

Part of the idea of “being literate” is both doing the reading and writing, and incorporating references to other writings that may be of interest. For example, Marvin’s Turing Award lecture starts out with him saying that “The trouble with computer science today is an obsessive concern with form instead of content.”

He meant that what’s most important about computing is meaning, and how it can be dealt with and represented, rather than one of the big topics of the 60s, how to parse both programming and natural languages. For him, what was most interesting about his grad student’s — Terry Winograd’s — thesis at the very same time was not how well it could handle English grammar (it was very good at it) but that it could put meaning to what was said to it, and could reason using that meaning. (This is an echo from the past of what Ken reports in Marvin’s blog.)

A parallel way to look at the “universally learned language” idea is again from seeing the difference in what different writers can discuss and illuminate through writing. Quite a bit can be done without changes to the language, and even without adding vocabulary. Similarly with math symbols and syntax — it is very easy to write a formula that the reader can easily parse, but where the meaning would require them to study for a few years. This is part of what Marvin is driving at. One of the fun Turing machines in Marvin’s book “Computation: Finite and Infinite Machines” (one of my favorite books ever), is a fairly typical computer with only two instructions (add 1 to a register, and subtract 1 from a register and branch to a new instruction if the register goes less than 0 — there are many variants).

This is a universally learnable programming language, but note the gotchas. A reasonable solution to “universally learnable” also needs to have certain kinds of expressive power — which will likely take longer to learn — and then you start having tradeoffs with a public that really doesn’t want to do much learning on anything.

Don’s interest in what he called “literate programming” led to an authoring system (originally called WEB) that would allow Don to explain the very program that was being written, and which included many features to allow extraction of parts of the program for human examination. The idea was that the WEB document was the program, and the compiler could extract the compilable and runnable parts from it.

Another early wrinkle was the idea of dynamic media which was an active idea in the late 60s, and for many of us was to be an important part of interactive personal computing. One of the several motivations for this idea was to have something like Newton’s Principia in which the “math” was dynamic and could be run and was attached to graphics, etc. This was part of the impetus behind the Dynabook idea in 1968. One of the terms that started to be used back then was “active essay”, where the kinds of exposition and argument one expects in an essay are enhanced by having an interactive program as one of many media types for the new kind of document.

Some very good examples were done in Hypercard by Ted Kaehler and myself in the late 80s and early 90s. Hypercard wasn’t directly set up to do this — the scripts were not media objects for the cards — but you could do some work and get scripts to show on the cards and be interactive. A particularly provocative example was “Weasel” which was an active essay explaining a part of Richard Dawkin’s book “Blind Watchmaker” by allowing a reader to experiment with a framework that used a kind of breeding process to find target sentences.

It’s worth pondering that despite the almost perfect fit of what Hypercard was about to the emerging Internet — and its wide adoption by the early 90s — the people who did the web decided not to adopt either it or the larger early Engelbart ideas. And Apple — who had many ARPA/Parc people in its research wing — refused to listen to them about the importance of the Internet and that Hypercard would make a great start as a symmetric reading and writing system for it. Apple declined to make a browser at a time when a really good one would have been significant, and might have made an enormous difference in how the “public face” of the Internet would turn out.

If we move forward a few years, we find the absolute absurdity — even almost obscenity — of a web browser without a real authoring system (consider how silly wiki authoring had to be to even work at all), and as one of many prime examples, the Wikipedia article on e.g. LOGO, running on a computer, yet without any ability for the reader of the article to try LOGO programming from within the article. This meant that what was important about computers was blocked off from the users in favor of various embodiments of old style media.

It’s worth pondering that Wikipedia was and is a prime genre for thinking about, inventing, implementing, and writing “the literature of computing” that is needed (and that would certainly involve both reading and writing of many kinds of media including programming).

It’s even more worth pondering that I can’t write a program here in this Quora answer — in 2017! — that will help show what I’m trying to explain, despite the immense computer power that lies under this weak idea of interactive media. This is completely missing “what happened?”.

To get a glimmer of the problem, here is a system from 1978 that we resurrected a few years ago partly to do a tribute to Ted Nelson, and partly for the fun of it. (Please start here at 2:15)

The whole system is an early attempt more than 40 years ago at what I’m talking about. An explicit example can be seen at 9:06. Besides “dynamic objects”, one of the key insights here is that the “views” — the media that are seen on the page — can be handle uniformly and independently of their contents (we called these “models”). Everything is a “window” (some have explicit frames and some don’t show their frames). They are all composited on the “project page”. Another insight was, since you have to compose and composite some things, make it so all things are composable and compositable.

I think unsophisticated users are to be forgiven for not being able to criticize bad designs. But computer people who make interactive media for users, and who do not take the trouble to learn about media and design, especially from the history of their own field, should not get off lightly, and should not be rewarded for “making ‘normal’ weaker”.

Finally, a field without a real literature is almost tantamount to not being a field. A literature is a way to keep the great ideas in a genre fresh and in the present and future of thinking within the field. This certainly doesn’t exist in computing to any useful extent. Like a pop culture, computing is still most interested in what can be done without large learning curves, and where the doing is more important than the ramifications of the results. A literature is one of the environments where one can move from the simple and immediate to the larger and more important.

We need it!

As a true engineer, how do you figure out what are the right tools to invent to help you do your work?

I’ve worked with “true engineers” and I think of myself as an “enthusiastic hobbyist” with some “scientific leanings”. Despite this much of what I’ve done over the last 50 years has very large engineering components (you have to build things in computing to vet your ideas).

If you are interested in engineering, then it is a very good idea to learn about great feats in the field over the years (and there are many). A fascinating course I had in the engineering high school I attended (Brooklyn Technical High School), was “Industrial Processes” — taught to all Freshmen: it was about how things are made, especially basic materials, and especially in quantity.

A good example of large engineering to look at is the Empire State Building. It was started just as the Depression was started, and there was a sense there would be no more “tall buildings” for a while.

The builders — The Starrett Brothers — decided to make the building a tribute to their trade and art. From the time of the start of demolition to the time of occupancy was a little less than one year, and the whole thing was done with around 3000 people. The process was rhapsodic if you love engineering, and much can be said about how they did it. There are a number of terrific books, including a facsimile of the daily log of one of the supervisors.

For the purposes here, the brothers — who had built several other tall buildings in the city — were competing with a number of construction firms to get the bid. They were asked “What tools do you have for this job?”. The other firms assured the owners that they had all the tools needed. Paul Starrett said, “Not a blankety blank thing! Not even a pick and shovel. Gentlemen, this building of yours is going to present unusual problems. Ordinary building equipment won't be worth a damn on it. We'll buy and make new stuff, fitted for the job … That's what we do on every big job. It costs less than renting secondhand stuff, and it's more efficient.”

Some of the tools they built included a narrow gauge railroad for every floor of the building, enormous temporary elevators on the outside of the building, new methods of scheduling so accurate and comprehensive that the steel girders were still warm from the steel mills in Pittsburgh. And so forth.

We loved this story at Xerox Parc, and one of the several reasons for its great success “working on unusual problems” was the same as for the Empire State Building: making not just the stuff, but the stuff to make the stuff.

This is interesting to ponder, and it’s an example of a saying where the first order version is true, and the second order saying is the opposite but also true. First order is “don’t make your own tools, especially in hardware and software, computers, languages, operating systems, development systems, it’s a rathole you can get trapped in”

Yep. But, “if you have the chops to make your own tools then you should and must” because you will be able to free yourself from other ideas, and uncessary workarounds, etc.

The answer to this question depends on the chops you’ve developed. (Great engineering needs great chops!) So develop them well ahead of time if you are serious about any field.

Meanwhile it’s worth pondering just where computing is in comparison. Could we do “the largest of something” in less than a year and muster about 3000 people to do it? I don’t know of any such happening in our history. (Maybe we don’t know very much about the engineering of our own field …)

What books would Alan Kay recommend for people who want to become true engineers? How should they go about building projects?

As with science, medicine, counseling, and many other fields, I think there are issues of temperament as well as knowledge — there is a kind of “calling” here. And, I think, a kind of “attitude”.

Henry Petroski is an engineer who has written a number of very good books about engineering, and these can be perused to get a basic “feel from the outside”. Another excellent engineer who writes very well is Sam Florman. There are several excellent talks and essays by Richard Hamming …

A key part of engineering is the love of doing things, and especially directly, and especially doing them well. A large part of engineering grew out of “tinkering” (aka “hacking”) with added desires for “principled designing and making”, “integrity”, etc. All of the great engineers I know personally have deep moral streaks about what they do, and about why it “should be done as well as possible”. Part of the overlapping temperament with science is that of the “lab rat”, who is happiest in being finicky about an experiment or making a new piece of experimental apparatus.

If we make a Venn diagram of the historical progression of “STEM” we get an overlapping “TEMS” : Tinkering, Engineering, Mathematics, and Science. Most modern practitioners get good at all of them, and much of the best stuff is found in the sweet spot intersection of these. Great “do it” teams are formed from people who do a bit of all, but are most deep in one or two of the circles. I’ve had the most fun in my career working with great engineers, and having an engineering background from high school helped a lot (even though I strayed into science and math).

As far as advice is concerned, I’d urge not just tinkering and making, and not just getting fluent in all of TEMS, but to find internships etc where real things are being done, especially tough things. A lot can be learned by watching experts do their thing, and being able to do things along with them.

A big revelation to me was the “attitude” part that I found when I accidentally wound up in a grad school that was part of the ARPA community. The whole community was just “used to being able to fly by visions, and to make anything that was needed to make the visions real”. In that kind of culture, with that kind of confidence, and that kind of track record, it is much easier for learners to pick goals that really matter, and to find ways and colleagues to make them happen.

In software engineering, is it better to just figure it out and hack together a solution or to understand the fundamentals and then implement it?

Interesting range of answers so far …

The first question I would ask is whether you are taking the word “engineering” seriously or is your notion of “software engineering” an oxymoron, or an ancient sense of the word, as in “Library Science”.

If you want to take “engineering” seriously, then it is worth looking at how it is done outside of software in civil works, aviation, electronics, bio, etc. All of these rely heavily on scientific theory combined with pragmatic model building and testing, and these use powerful CAD tools combined with SIMulators, and very often then linked to FABricating facilities to make both extended prototypes and eventually many instances of the desired artifacts. (By contrast, the tools most computerists make for themselves are quite terrible, even embarrassing! For example, Vi, staying with edit-compile-load cycles, not using SW simulators, etc.)

In praise of real engineering in computing, there is nothing more fun than to be a scientist-engineer working with an engineer-scientist (a great example of the latter was Chuck Thacker at Xerox Parc).

My favorite line in the movie “Spy Games” is the spy talking to his assistant “When did Noah build the ark Gladys? Before the flood, before the flood”. Here meaning that if you are going to have situations where huge deadlines are put before you, then as a good “real engineer” you will have prepared by having great tools already in place. If you haven’t done this, you are not taking “software engineering” seriously enough.

As for “exploratory programming” — it is very important to do almost always — the trick is to be able to do it so easily and quickly that one is not tempted to save it. I liked to do explorations — and found Smalltalk in its day really excellent for this — in order to get a better handle on the design parts of new ideas and goals. This kind of programming feels to me just like the pencil and pen sketches one does to understand form, and of which very few to none are adapted directly into the actual work (thank goodness for difficult to erase media (these force more visualizations and separate tests to be done)). These are also like the physical models of bridges, etc that are illuminating and necessary but not incorporated into the actual constructions.

The last idea here is that many of the properties that help prototyping and exploration -can- be of great help in serious implementation: for example, how well does your programming language allow separation of “meaning” from “optimization”? Most languages and most programmers intertwine the two very different things, and this leads to difficult to understand and deal with dependencies, especially later in the learning curve when you are really finding out how your system should be implemented — and worse, much later when requirements have changed and the SW needs to change also. If you have a typical — bad — system for where we are in the 21st century, you will likely try to patch rather than -safely- redesign within what has been done so far (hint: this is generally not a good method for any kind of engineering, including in software).

What do you think the next great invention of the 21st century will be? One as big as the internet. This idea of synthetic realities is something I see having huge potential.

Really smart people — like Ivan Sutherland — can figure out most of the implications of new ideas and technologies from the get go. Take a look at his 1965 paper The Ultimate Display and see what you think about your question. Then you could take a look to see where VR and tactile sensing, etc., actually came from, etc.

What are those "powerful ideas that basically mean the difference between life and death"?

You can identify many of them. Two types are (a) “powerful ideas that have immediate benefit on life and death” — for example, the germ theory of disease and what is implied for sanitation, being able to make insulin to save diabetics, and a myriad more having to do with health and nutrition. And (b) “powerful ideas that are about ideas” — for example, “there are powerful ideas” (it’s worth finding them!), “we are mostly blind in so many ways” (so use powerful ideas to make tools of many kinds to help us “see” — one of them is “science”, etc.).

An example of a problem that involves both (a) and (b) is the estimated 100,000 to as many as 400,000 patients that die unnecessarily each year — many in hospitals — because their doctors are not careful about sanitation. Here we combine the powerful idea of agents too small to see that can kill us and can be spread by contact, with trained humans who have learned about these agents, and how to deal with them, but whose commonsense part of their brains keeps them forgetting about the very germs they learned about and actually saw in microscopes.

In the large, it is making progress in (b) and forming deep habits around (b) that should be one of the main goals of education. We need to do much more “real thinking” rather than the kinds of “reacting and rationalizing” that is our genetic heritage.

Why did Xerox fail to successfully exploit any of its ideas at its Palo Alto Research Center from the 1960s and 1970s?

A common myth (and easily exposed with a little poking of the web). The simplest one is “the 60s” (Xerox Parc started in 1970).

Depending on how you count, there were around 7–9 major computer related inventions from Xerox Parc in the 70s. One of them was the laser printer (by Gary Starkweather). Xerox made many billions of dollars from this invention and paid off the entire investment in Parc many times over.

I think this myth — this lie — was made — or at least promulgated — by business people who didn’t want to invest in these kind of research processes, despite the enormous return on investment to everyone.

It’s true that Xerox could have done more with some of the other inventions, but that isn’t the point here. Consider that most of the inventions were “new industry making”, and needed to be accessible by all, as was the Internet (done by the larger ARPA/Parc culture).

My biggest complaint has been that the commercial exploitation of most of the inventions was “half-way and in pieces”, and this did not get fixed in the succeeding 35 years or so.

What is Alan Kay’s opinion of Assembly and mainstream programming languages like Google Go?

John von Neumann was famously against using valuable expensive computer time to automate human tasks that could be done by cheap labor. When FORTRAN came out in 1956 someone asked him about “higher level programming”. He summoned a grad student and said “Go write me a sort program”. The grad student left. von Neumann said “Either that student will come back with a sort program, or he won’t come back at all. That’s higher level programming!”

This story is too good to be true, but I heard it about 6 years after it happened, from my boss at NCAR — Glenn Lewis — who was a grad student of von Neumann’s at the time. (And an assembler was a waste of computer time according to JvN when you had students.)

By contrast, Ivan Sutherland in 1962 while inventing modern interactive computer graphics and object-oriented models, also decided to have the user “program” Sketchpad via simply giving it goals to accomplish (and completely bypassing the kinds of mechanisms of hardware digital computers of the time). The “compiler” of Sketchpad was a constantly running quite comprehensive “solver”. (It’s worth realizing here that eventually most of the programming of the future will be done this way — and it’s worth trying to understand why CS hasn’t really gotten to this after all these years.)

In any case, pretty much all of the good questions about programming were being asked by the mid-60s, more than 50 years ago. Some of the machine architectures for answers to the questions were better than those available now.

But still, the conflicts are virtually the same — programmers need to think in terms of desired relationships, but for advancing will need to do optimization of various kinds to get the throughput needed. And a lot of optimization comes down to finding ways to violate useful module boundaries without shafting the system, often by dipping lower into the causality structures.

So the real question (I think) is not the above, but “how much of what kinds of code should be the max that can be stood”.

Looking at this would greatly help one of the main problems of any kind and level of language, and that is that most programmers wind up writing too much code, and especially code for which the language is poorly expressive.

This is ironic — or worse — because one of the main points of a computer is that you can make new computers with them. A higher level language — or a low level one — besides doing what it was made for, can also make the escape from its own prison by allowing more suitable languages to be designed and made.

In part, that’s why we have so many languages — but for various — mostly bad — reasons they wind up as stove pipes rather than as a system of tools, each to be used where they really pay their way. But most computer people are still stuck in “algorithms and data structures” and haven’t yet really gotten around to thinking in terms of systems …

"If you don't fail at least 90 percent of the time, you're not aiming high enough." - Alan Kay. What are some things Alan Kay has aimed for and failed at?

Most things. (Consider the world of the idealist …)

A big turning point in my late 20s was to gradually move to evaluating on “quality of effort” rather than “quality of results” (still wanting the latter to be high, but eventually recognizing the real conflicts). Picasso had a great line for all of us: “A work of art is never finished, it is only abandoned”.

There’s a nice intertwining with the art of science here: science doesn’t get “truth”, but it can get “something good”, something above important thresholds of illumination, something that can save lives, that can lift lives.

Added July 10, 2021

Certainly one thing that I’ve put many years of effort into without making much of a dent is trying to improve education — especially public education — for elementary aged through middle school children. This definitely qualifies for “failing more than 90% of the time”! and I think also “for aiming high”.

In Alan Kay’s answer about the best books to understand CS concepts, he mentioned that he would look for an excellent follow-on book to the “The Pattern on the Stone” book. Has he found it?

There probably is at least one really good book out there that could be a follow-on to Danny’s “Pattern in the Stone” book — but I haven’t come across it. Perhaps some of the Quora readers can make suggestions.

I’ve known Danny since he was a teenager, and he has been a colleague and friend for many years. When he wrote his book in the late 90s, I was very taken by it, and asked if he would mind if I wrote the “next logical level” for it — he thought that would be great. However, as John Lennon once said “Life is what happens while you are making plans”, and the project never materialized.

I think today, one could take the 9 chapters in the book and make the next logical level be online experiences in actually building the types of mechanisms that Danny introduces. Not just the programming parts, but the “nuts and bolts” primary hardware parts. There is enough computing power on personal computers today to do workable gates-and-wires simulations of whole computers. The Silverman brothers — and others — have done some wonderful examples of “oldies-but-goodies” — for example the recreation of the classic 8 bit micro CPU the 6502.

What’s needed for the next level of Danny’s book here would be a very simple CPU design (like the original ARM but simpler) that could be made online from just a few layers of architecture, and especially terrific simple CAD-SIM tools for beginners to help make the structures.

Similarly, it would be fun to do a very simple bootstrap of a very powerful programming language (e.g. Lisp shows that super powerful can be super simple — this possibility of powerful-and-simple is one of the biggest ideas in both systems and computing).

I think I would do the bootstrap somewhat in the style of Schorre’s Meta II (that was almost the way we did the original Smalltalk).

The last two chapters in Danny’s book would be the most fun and the most challenging to create an accessible level for “AI” and “biological brains”.

The term “AI” today is used for such a restricted subset of “behavior” that the new term “Artificial General Intelligence” now has to be used for what “AI” used to mean (this is sometimes called “colonization”: when people want the cachet without the work, they often appropriate the term for what it is that they are willing to do — this also happened to “objects” in the 80s — we now have to call the original ideas “real objects”).

I think it would do the readers of our hypothetical book a real service if we helped them understand “real AI” by including a section on “real thinking” (as opposed to just massive correlations).

And there’s a lot we could do with helping readers understanding biological nervous systems (perhaps by having them make some “organoids”?)

Summing up, I think Danny did a really great job in writing a “concepts” book for the non-computer savvy reader. I think there is a real need for a “concepts-by-making-them” followup that is both real and very simple. A lot of the complexity in computing of all kinds has come from the admixture of optimizations, many historically needed, and also from poor designs.

But for learners today with the computing power available today, a lot can be done with really simple mechanisms that have the same graceful style as Danny’s and can be made and debugged and understood by the reader.

What are some new kinds of businesses you would like to see?

The economist Milton Friedman — perhaps the epitome of “dismal” in “the dismal science”? — famously declared that the sole purpose of business is to make money for the shareholders.

A larger deeper mind — Peter Drucker — countered with the idea that the purpose of business is to best serve the customers.

One of my favorite businessfolks — Spedan Lewis — formed the John Lewis Partnership in the UK to make a company of hundreds of stores — John Lewis department stores and Waitrose Markets — that are completely owned by all of the employees (embracing Drucker’s idea with a new important dimension).

If we step back to look at the larger processes going on, we find the painful paradox that the -potential for doing- (a deep way to look at wealth) in our societies stems primarily from large cooperative organizations such as science and democracy: all wealth ultimately comes from the energy we can extract and turn to our purpose, and from the open ended societies that allow creative progressive directions to be chosen by the participants.

But, squinting our eyes at smaller groupings within this, we find what appear to be myriad tribes of neolithic pre-agricultural hunter-gatherers exploiting the lush valleys they find around them, moving on to the next when there are no more resources easily found, and competing in every way with the other tribes around them. Metaphorically, and sometimes literally, stripmining society.

To find atavistic behavior in modern times is not surprising — we are genetically predisposed for many kinds of such behavior — but the lack of understanding of how to really make progress given that the history of the last several hundred years is readily available is not just surprising, but shocking, and now: quite dangerous.

If one of the perspectives we can take on our situation is systems, then we need to consider “The systems that we live in, and the systems that we are” — for example, the intertwining of: the universe, our planet, our social-governance, our technologies, ourselves, our mentalities.

Cooperative organization has produced great synergies in many areas, and the non-understanding that systems organizations are only dynamically stable and have toppling points now threatens our life on this planet.

The powers that need to be better handled are both material and informational — the largest powers of our time are communicational.

As Vi Hart has pointed out “We need to ensure that human wisdom exceeds human power”.

The kinds of businesses — new and old — I’d like to see are those that also understand that their very ability to exist depends on the larger systems working well, and that it is a primary function of all systems to increase in the large both real progress and stability.

In other words, it’s not about “hunting-and-gathering” but more like gardening, and not just for food, but is about the “gardening and nurturing of humanity”.

What are Alan Kay's favorite books?

I don’t know how to give a reasonable answer to this question, but I don’t want to ignore the question either.

I have friends, and know of people, who have certainly read more books than I have, but I’ve read enough to be stymied by questions about “favorites”. I have never tried to count them (it wasn’t a competition), but when asked about this a few years ago I estimated “likely not more than 20,000″.

Not a lot when one realizes that the Library of Congress claims more than 23 million books yikes!

In the 80s I was asked by a consulting company for my “top 10 books”, and came up with several hundred. This list has been around via various hands, and one version of it — Alan Kay’s Reading List — was put online a few years ago.

One of the ways to think about this is that there are many more than “10 Great Ideas” and each one would likely at least be a book — and a good idea will probably give rise to at least one good book — and therefore we are really constricting our view to think of 10 or even 100 or maybe even 1000 top (and favorite) books.

However, there have been books over the years that have seemed really special for various reasons — I think of these as “top 10” even though there are probably many more than 10 if I were to try to write them down.

Just to take one as an example: Newton’s “Principia”. I have never forgotten the combined shock and thrill of making my way through this in my 20s. The greatest single leap ever? That’s probably too hyperbolic, but “oh my goodness”. Add in the amazing style and organization that he picked for this book, and it seems to me to shine above all others.

A formative book in so many ways was “20,000 Leagues Under The Sea” by Jules Verne. The first translation I read as a kid was not the best one (I recommend the Anthony Bonner translation today). But “Capitaine Nemo ç'est moi”.

Ray Bradbury wrote a beautiful musing that compared “20,000 Leagues” with “Moby Dick” …

Much earlier, the first adult book I read all the way through was my father’s copy of “Mythology” by Edith Hamilton. I had one of my first “mental wakeup calls” when I got to the Norse gods part and realized they were quite parallel in character and process to the Greek and Roman gods in the front part of the book. This got me to realize these “were just stories” that had been made up for reasons I didn’t yet understand. A few more books like this, and by the time I got to first grade I had been ruined for the “single book — single truth” ideas of school and church,

Another book that I’ve loved — as a lovely book to read — was the original “Molecular Biology Of The Gene” by Jim Watson (ca 1965) — and its truly wonderful offspring “The Molecular Biology Of The Cell” by Bruce Alberts et al (with Watson as one of the authors). This is a favorite for a different reason: for many years it has been the best single volume narrative of “life from scratch” (I particularly love the 3rd Edition).

A favorite book is “The Lisp 1.5 Programmers Manual” by John McCarthy, et al. I have bought many copies of this book, both for its mojo and to give as gifts. It has been published by MIT Press since the early 60s, and is still in its original typescript form. Why this book? It’s kind of “the way things should be done”. The front section — by John — is especially fine and is a straightforward introduction to the language done in a kind of axiomatic style. Starting with a few primitive functions, he writes other functions showing how to manipulate symbols, and finally uses some of these to write the entire language in itself on the bottom half of page 13. I have called this the “Maxwell’s Equations” of computing, because it presents a very large part of what’s important about programming languages “in an eyeful” and “in a brainful” which provoke and allow further “nifty thoughts”.

And to pick another favorite: “Computation: Finite And Infinite Machines” by Marvin Minsky. This is another really beautiful book, in approach, style, humor, and depth. It’s essentially a math book, but one doesn’t have to bring much math to it, just one’s mind that will be happy to touch some mathematical thinking. It was from this book — and an amazing Marvin way to show a Lisp like language using Goedel Numbers (hee hee) — that I first learned about Lisp 50 years ago.

But how could I leave out a very different kind of book: “The Organ-Builder” by François Bédos de Celles? This is a several volume set from Diderot’s 18th century many volumned Encyclopedia, which aimed to capture everything that was known in his day. In this case “really quite everything about organ building in France”. The edition I have was from a subscription for a beautiful translation by Charles Ferguson. One of the volumes has large reproductions of the many hundreds of engravings showing every technique, every tool, every pipe kind and scaling, etc. I was very lucky 30 years ago to help master organ builder Greg Harrold build a mid-18th century organ (for Berkeley) that only used these techniques and “showed no modern tool marks”.

For favorites, I have to include “Art In The Blood — A Sherlock Holmes Adventure” written by my wife Bonnie MacBird (the original writer of Tron), and also the website that includes her annotations and original drawings. This was a labor of love on her part, and she completely nailed the Arthur Conan Doyle voice of the characters and narrative, while being able to carry a marvelous story into the much larger realm of the novel (a difficult territory for Holmes stories, and one that Doyle himself never entered).

And so it goes. We could all make out a list like this.

In the reading list mentioned above, I often said “any book by author X”, and I still think more in terms of authors than single books (I have a feeling most extensive readers do). It’s not that every book by a particular author will be equally enthralling, but I’ve found it really worthwhile to try to read all the books by an author if my first encounter was positive.

I should also point out that it’s the fate of most ideas — and most books — to not be great. But, again, reading a lot of books will reveal that many not so good books somehow help a bit with the great ones. This is because context is the larger part of understanding. It provides the multiple perspectives we need to avoid religious feelings about any of our ideas. From this standpoint “most books help at least a little”.

I will probably return to this answer a few times to tweak it up.

What if we were able to prove that AGI is beyond the computational power of the Turing machine?

I think Cherie Woo below has a good approach to this (it was the philosophical approach used in the 60s for doubters of “eventual AI”).

Philosophically, the question is “Can we in principle simulate physics well enough to make the biology we have, and to thus make the brains that we have?”

I think the answer is “probably”.

Engineeringwise, the question is “Can we simulate AGI efficiently enough to think the same kinds of thoughts (or better) that we think?”.

I think the answer is “probably”. (And almost certainly the solutions will not be at the level of physics.)

To what extent, if at all, did Smalltalk borrow from Lisp's philosophy or ideas?

As Mark Miller noted below, a fairly complete history of Smalltalk can be found in The Early History Of Smalltalk. Part of the history is an explanation of how long it took me to see the deep implications of what was in front of me (this is because I started with a conventional mindset about computers and languages, and that made me pretty blind to what was new and important).

The simple answer is Yes!

In the context of Biology and Mathematics, the object ideas came primarily from Sketchpad and Simula (and cells), the messaging ideas came from thinking about objects as being like cooperative processes in a time-sharing system or between computers on a network (and cells), and a lot of the metathoughts and underpinnings were directly influenced by Lisp (and math and cells).

There were a number of revolutionary architectures to see in the early 60s — Sketchpad, Lisp (basically Lisp 1.5), the B5000 (a computer that could directly execute byte-codes completely safely, and was essentially a multiprocess “Simula machine” before there was a Simula), and APL (not yet implemented, but in Iverson’s book). And, even Algol (with its recursion, call by name, nested block structure, etc. was revolutionary at the time). (And there was quite a lot more to see as well, especially some of the meta systems for generating languages from metalanguages, etc.)

Just sticking with Lisp here, what was cosmically interesting was the idea of Lisp, meaning that there could be something mathematical about programming that could also be practical — that is: you could do something with relations and inference — and most especially, that a powerful universal language could also act as a practical metalanguage — not just for implementing, but for thinking about programming and programming languages.

For example, if you look at McCarthy’s beautiful lead up to the bottom of page 13 in The Lisp 1.5 Programmers Manual, you can see how John thought about these ideas. And once the whammy of the half page interpreter in itself has been grokked, then you will start thinking new thoughts about the semantics of programming and programming languages.

It is also worth looking at some of the rest of this little book to see how the implementers of Lisp went about their business. One thing that made it into the 1.5 version was the realization that “special forms” — which seemed to be called for by the functional underpinning of Lisp — could be thought of as a base “functlet” delayed parameter evaluation idea, and that normal functions could just be a special case of this. (Many different interesting thoughts came out of this realization, which in turn came out of needs for implementing “special forms” such as QUOTE, LAMBDA, COND, etc.)

It’s also worth looking to see how the practical dynamics of implementing and using overwhelmed some of the cleanliness of the initial ideas. Part of what went on happened “too fast” for the kind of thinking needed for great design. This led to some kludges. (And very similar things happened with Smalltalk later on once we started to use various versions of it.)

Some of the kludges and not quite solutions came about with regard to functional arguments, the “prog” feature, scoping of variables, closures (Lisp 1.85 at BBN had four different, and interesting, notions of “closure”, etc.).

Another kludge was leaving out John’s “m-language”, which was supposed to be a much more readable way to do Lisp programming, in favor of just programming in the “machine code” of S-expressions. (There is another whole history of the ramifications of this “basically lazy” choice of the early implementers.)

Most of these kludges were cleaned up, sometimes years later, and some in other languages. Interestingly, McCarthy had had some influence on Algol (for example, Algol conditionals), and Algol had some of the solutions that Lisp needed (such as static nesting scope, and the “call by name” mechanism for functional arguments — however, it is worth noting what was positive about the dynamic variable scoping, especially in some AI reasoning processes … it would have been interesting to allow both …).

But the Lisp culture at MIT was quite in their own world. It is interesting to read some of the years later Scheme design documents to see the “rediscovery” in the 70s of some of the Algol mechanisms (some through the work of Peter Landin in the 60s).

Still, even with kludges, there is a lot to think about here, and in compact forms that leave some brain-cells left for actually pondering programming semantics.

All this “understandable kernel” lore should lead to variant thoughts — especially these days when there is much experience that can be brought to bear.

Here’s my favorite “most important unknown thesis in programming language thinking” Dave Fisher’s 1970 CMU thesis Control Structures For Programming Languages. He uses the “Lisp approach” in a very sophisticated way to make a deeply meta extensible language. Much can be learned about many important things from grokking the first 100 pages of this work.

The other quite wonderful much more recent piece of work that needs to be mentioned here is “The Art of the Metaobject Protocol” by Kiczales et al. (MIT Press). This is deep deep “Lisp thinking” in the service of objects, and vice versa.

Just to mention one interesting “variant thought” to close off this too long answer: it’s worth pondering the split of “eval” and “apply” in McCarthy’s interpreter. “Apply” is tantamount to sending a message to an object, especially if parameter evaluation can be controlled on the receiving end by the object itself, and especially if the receiver is a closure. Now just make this whole mechanism as loosely bound as the rest of the ideas in Lisp, and you have a very simple basis for a different look at computation and programming. (And don’t forget to note the implications if everything were actually in the form of property lists, not just raw storage cells … .)

What tools are useful when outlining and drafting a CS dissertation?

The number one tool is to have good results that are worth writing up. Too many dissertations these days are too incremental, and often from small papers done over the course of grad school — try to work on a real problem, and get real results. Dave Evans used to say that “A PhD is two years of world class research”. If it was a huge problem, then he allowed the first two years of real progress to be the dissertation. If it was a big problem, then he wanted a solution. He didn’t want anything that wasn’t a real advance to the state of the art.

Ivan Sutherland, when asked “What is a PhD thesis?”, said “It’s something you can get three people to sign!”

So, first talk to your committee — and if things are early in the game, you can take the best actions by helping to put together your committee — especially to find a sympathetic outside member.

I would next suggest that you read a number of theses to get a sense of their form and rhetoric, and especially to find several that seem to fit your topic and argument — and personal style. And, for example, what are theses like at your institution?

Ivan’s thesis on Sketchpad is a classic in every sense. Many of Marvin Minsky’s students in the 60s and 70s wrote classic “MIT theses”, in which the first few chapters, especially, were extremely well structured.

I don’t understand what you mean by “accumulate points” — that sounds ominous.

I’d try to avoid LaTEX if possible — and if not, to just use at the very end.

Some people can plan and write top down, and some will have a much easier time to simply organize ideas to try to find a good sequence. In either case, if you care about the writing at all, you will need to put together something that is a kind of narrative.

One of the ways to do this is to see if you can write a one or two page summary of the whole thing, or perhaps to write a slightly longer paper, as it might appear in a proceedings. This will provide an overall narrative structure than can be more useful than an outline. The two main ways to expand this are to (a) add in details, or (b) keep the short paper as the intro chapter, and think of the rest as “appendices” that contain details and references.

What are the pros and cons of developing software from the top-down and from the bottom-up?

Somewhat harmonious with some of the other answers: try making a small scratch throwaway working model first to get an overall sense of the system design and relationships needed.

This is especially useful when working in a language system that has a huge library of already made stuff (which often tempts “creation by searching for features” rather than thinking about the actual problem). In other words, it is often helpful to treat a library as possible optimizations that might later be useful, but design is usually best if optimizations are left out in the first pass.

This route is most useful and fun if the language and system is amenable to quick “sketches”. Too much work and delay leads to hacks and tendencies to retain the throwaway model (the chances are low that the first sketches are what you actually want to do, so things have to allow the designer/prototyper to be able to move on without a feeling of loss).

Just as an example, “workspace programming” in Smalltalk could have been even better, but it allowed a lot to get tried out before making big commitments to class relationships, and all the code would be a single “eyeful” in the workspace. One thing that helped was that a lot could be done in Smalltalk in a few lines, and this kept the experimental code small enough to “be thinkable” rather than just “doable”.

And, but, … consider the difficulties involved in taking a completely malleable lump of clay and trying to debug it into something beautiful! What’s needed is a “vision/sense” of what is to be created. (This is why “sketches” need to be thrown away — their purpose is to help form the inner senses …)

What do Alan Kay and other message-oriented programming pioneers think about Ruby?

The real question is: how should programming really be done ca 2017? And let’s also pick a check point around 2027 to help think about my first question.

Has PARC (formerly Xerox PARC) become just another company? Has it lost its original identity?

Parc was four laboratories: Physics, Optics, Computer Science, and Systems. The last two were the most associated with what people think of as “Parc”, and the deeds most associated with “Parc” were mostly done between 1971 and 1983 when Bob Taylor and most of his lab left (I left in 1980). Some good people remained. And there were lots of very good researchers in the other two labs.

In my mind “Parc” ended when Taylor and his main people left. As you can see, this is not completely fair to those who remained. Still, to me the place was about top researchers having the resources to carry out their research. When “top-downess” gradually was imposed starting in the late 70s, that was the beginning of the end.

How much cash did Xerox have on hand when it formed Xerox PARC?

Worth checking. My recollection is that they were about a $6B/year (1970–71 dollars) company at the time Parc was set up.

What kind of cheese makes the best grilled cheese sandwich? Why?

Kind of redundant given all the answers, but this is the kind of dish that we react to personally … a great grilled cheese sandwich is my favorite comfort food, and my notion of it is from the ones my mother used to make for us in New England in the 40s and NYC in the 50s.

One of the New England tastes is brown butter on bread, and the rolls made for hot dogs and “lobsta” are white on the outside so they can be lavished with butter and then browned in a pan. And this is also the case with grilled cheese sandwiches (at that time called “toasted cheese sandwiches”).

The bread back then was “Wonder Bread” (kind of terrible air puff, but redeemed by frying in butter), and the cheese was “processed American” or “doubly processed American” (i.e. Velveeta).

My dad was Australian and loved sharp cheddar cheese with apples. Sometimes he would part with some cheddar for the grilled cheese sandwiches. Sometimes he would pop apple slices into the sandwiches — not bad but to me missing the essential taste.

It was years before I found out that “bread” could taste good (or even have taste).

My favorite version 70 years later uses multigrain/nut bread which really grills up wonderfully, and the sharp cheddar of the kind my dad loved. Great French baguettes are also terrific when grilled. I like to put a plate on top to weigh the sandwich down in the pan, and to cook each side just short of burning the butter. so the result is very dark and at its tastiest.

What is the meaning of life in 5 words?

“We make our own meanings”

What was it like to be at Xerox PARC when Steve Jobs visited?

A good enough answer would be longer than is reasonable for Quora, but I can supply a few comments to highlight just how little attention is paid in the media, histories, and by most people to find out what actually happened. For example, I was present at the visit and demo, and it was the work of my group and myself that Steve saw, yet the Quora question is the first time that anyone has asked me what happened. (Worth pondering that interesting fact!)

First, it’s worth understanding that many people (perhaps even a thousand or more) had seen live demos of the Alto and Smalltalk before Steve. This is because Steve showed up in 1979, and the Alto and Smalltalk had been running for 6 years (starting in the first half of 1973), and we were a relatively open lab for visiting colleagues and other interested people (like Herbie Hancock and Al Gore).

Many more people had read articles that I’d written (e.g. in Scientific American, Sept 1977), and one with Adele Goldberg (in IEEE Computer March 1977). These included many screen shots and a wide variety of Smalltalk media. For example, here is the provocative title of the 1977 SciAm article:

image

And a picture of the Alto being used by children:

image

And a picture of the Alto screen and caption from the SciAm article in 1977 (SciAm did a bad scan of this, if I have time I’ll find the physical issue and do a better scan):

image

The readership of SciAm in those days was about 2.5 million, so a lot of people saw this.

A second important fact about the 1979 demo to Steve, was that he missed most of what we showed him. More than 15 years later he admits this in this interview:How Steve Jobs got the ideas of GUI from XEROX where he says that we showed him three things but he was so blinded by the first one (the GUI) that he missed both networking and real object-oriented systems programming. (A fun part of this is that Steve, after praising the GUI to the skies, realizes what he’s saying and immediately says “but it was flawed and incomplete”, etc. This was his way of trying to be “top gun” when in a room where he wasn’t the smartest person.)

And, actually, he missed a few things about the GUI (for example, that it had unlimited and persistent “desktops” which could be used to sustain work/projects over time without having to tear down and build up, and without stovepiped apps, etc.)

The demo itself was fun to watch — basically a tag team of Dan Ingalls and Larry Tesler showing many kinds of things to Steve and the several Apple people he brought with him. One of Steve’s ways to feel in control was to object to things that were actually OK, and he did this a few times — but in each case Dan and Larry were able to make the changes to meet the objections on the fly because Smalltalk was not only the most advanced programming language of its time, it was also live at every level, and no change required more than 1/4 second to take effect.

One objection was that the text scrolling was line by line and Steve said “Can’t this be smooth?”. In a few seconds Dan made the change. Another more interesting objection was to the complementation of the text that was used (as today) to indicate a selection. Steve said “Can’t that be an outline?”. Standing in the back of the room, I held my breath a bit (this seemed hard to fix on the fly). But again, Dan Ingalls instantly saw a very clever way to do this (by selecting the text as usual, then doing this again with the selection displaced by a few pixels — this left a dark outline around the selection and made the interior clear). Again this was done in a few seconds, and voila!

The Smalltalk used in this demo was my personal favorite (-78) that was done for the first portable computer (The Parc Notetaker), but also ran on the more powerful Dorado computer. For a fun “Christmas project” in 2014, several of us (with Dan Ingalls and Bert Freudenburg doing the heavy lifting) got a version of this going (it had been saved from a disk pack that Xerox had thrown away).

I was able to use this rescued version to make all the visuals for a tribute to Ted Nelson without any new capabilities required. The main difference in the tribute is that the revived version had much more RAM to work with, and this allowed more bit-map images to be used. This is on YouTube, and it might be interesting for readers to see what this system could do in 1978–79.

https://youtu.be/AnrlSqtpOkw?t=142

Who was the first to introduce multiple desktops, Linux or macOS?

I hate to bring up the “P” word, but Smalltalk at Parc had unlimited persistent multiple desktops starting in the mid-70s with Smalltalk-76. They were called “projects”. All Smalltalk systems since have this as part of the system’s offerings.

You can see a revival of a Smalltalk-78 system in the tribute video I did for Ted Nelson a few years ago: Alan Kay's tribute to Ted Nelson at "Intertwingled" Fest . I did all the visuals for the tribute in Smalltalk-78.

The multiple desktops are used heavily (note that besides supplying multiple workplaces, they also create a much richer media environment than apps like PowerPoint, because in Smalltalk you can bring any and all objects to each “place” and integrate them — and they are “live” not canned).

I tried for many years to get Apple to adopt this (and a few other good ideas from the 60s and 70s) but to no avail.

What is your review of The HP Way (1995 book)?

Interesting that Quora will not allow me to write an additional answer … so I’ll flesh out what I wrote previously, which starts with:

I’m currently traveling, and would like to take a peek at the book to make sure I remember it from getting and reading it when it first came out — so I’ll postpone much of the review until I get back and can find it in my library.

I couldn’t find the book (from 1995) in my library so I ordered the paperback version that was published in 2005, which included several new highly valuable additions, including an excellent foreword by Jim Collins (author of the book Built To Last) that summarizes much of what is important about “The HP Way”, and a transcript of a speech Dave Packard gave to the HP managers in 1960. Both of these are worth the price of the book. In the previous note I then picked up with:

But I can say that HP in the 60s and 70s was certainly one of my all time favorite companies, and I visited quite a number of times when I was a grad student and later at Parc. I knew Bill very slightly, did not know Dave, and was friendly with a number of other HP luminaries (such as Tom Osborne, one of the main forces behind the HP-35 “pocket computer”).

What was impressive to me about HP initially was the large number and range of extremely high quality products (eventually thousands). Later, I was fascinated by the “HP process” of “intrapreneuring” — how they were able to invent and also take products to market. Still later, during and after the Xerox experience, I was -really- impressed with their process.

The main body of the book has much of historical interest — not just the rise of the company, but how they thought about what kind of a company it should be and become, how it should grow, and much of interest about growing pains, and a number of the routes they decided to take to deal with the immense scale-up that the company created. Readers will find much in common with some of my other Quora answers about the ARPA/Parc research community.

However — and especially given the difficulties and absences of today’s funding for long-range research — what should be of most interest, is how HP was able to fund and do a considerable amount of research internally and combine this with its amazing “intrapreneurial” processes — not just “innovation” to take products into manufacturing and to market, but do this in an atmosphere of thousands of different kinds of competing products. The Collins book is a bit more useful than “The HP Way” on this score: Built to Last: Successful Habits of Visionary Companies (Harper Business Essentials): Jim Collins, Jerry I. Porras: 9780060516406: Amazon.com: Books

Another very useful book for those interested in these kinds of processes is:The HP Phenomenon: Innovation and Business Transformation: Charles H. House, Raymond L. Price: 9780804752862: Amazon.com: Books . In many ways, this perspective on the HP process from one of its outstanding engineers (Chuck House was one of the most productive, and a wonderful character besides). He was also involved in some of the aspects of the following story from my first note:

While at Parc, we were also interested in what HP was doing with its “21MX” computers (the first HP computers to use DRAM memories). The two most interesting properties (a) very fast microcycle time (similar to the Alto), lots cheaper, and a real product (b) that everything important about the Alto was quite missed, including what the display and UI should be like, what kind of software and applications, etc should be run. That contrast has stayed with me (it’s another one of those “Context is worth 80 IQ points” situations). Basically, they already had a mini computer without a real graphics display, and they stuck with this when the revolution of cheap DRAM and other integrated circuits happened. In theory, they should have been the company to do the personal computer revolution — consider the HP-35! — but quite missed it.

I and my research group did work for HP — ca 2003–2006 — but this was a completely different company which just had the same name — and this experience doesn’t have anything to do with Packard’s book.

I think a good way to finish this review would be to specifically refer the interested reader to page xvii of Collins’ foreword, and his observations about the difference between understanding the principles — and applying them — vs labeling some of the resulting processes and trying to extract further good from them (my analogy would be the difference between the ideas of a benevolent founder of a religion and the reification of the “trappings” into rituals that mostly quite miss the original intentions).

Why did Alan Kay say MS-DOS was a distressing thing?

I believe that you can praise in a few sentences, but that criticism morally demands longer and more careful arguments. I don’t know how to be fair about a criticism on this subject in a Quora-sized answer, but I can point you in a few directions to see what I’m driving at.

First, you can learn a lot by looking at the states of the art of OS designs in the 60s and 70s. (Here there are some that are easy to praise.) For example, the Burroughs B5000/5500 MCP OS designs starting in the very early 60s.

Another one of my favorites from the mid-60s is the “Project Genie” (later SDS 940) OS mostly designed by Butler Lampson (this was the OS used on the famous Engelbart “mother of all demos”).

The view of Unix of Doug McIlroy (and the original notion of how a tiny kernel should be able to be extended by user processes) ca 1972 or so.

And, I also liked the “it’s all a live OOP language down to the metal” approach we used for Smalltalk at Xerox Parc.

Worth notable mention for small machine OSs were certain aspects of the Amiga OS.

The above is not an exhaustive list, but it’s a starting place.

You should also look at the progression and dates of both the bottom (kernel) and the top (UI) of the MS operating systems, and what features they tried to add, and when. And, it’s worth noting the large change outs along the way — for example, what were the needs besides simple scaling for NT (which was more like a “real OS” albeit an old mainframe type design)?

The basic idea here is that in software design, at worst should be reinventing the wheel (this is sometimes just right), but “reinventing the flat tire” should be criticized heavily.

There have been even more severe cases in software of people “reinventing the broken wheel” that is, producing an artifact that can’t really be seen as a wheel and where it is not clear how to go about reshaping it into a wheel. (This is hard to do after one has taken the time and put forth the effort to learn about wheels, and how to make them.)

Another way to look at it is that very few people are as smart and capable as Butler Lampson (or Bob Barton and the B5000 folks). Why not start with “the best so far”?

How can life form from nothing?

There are four parts to this.

The easy part is that many constructed things have very different properties than their parts. For example, we can start off with materials that don’t do what wheels do, and yet make a wheel, make a car, make a radio, etc.

The medium part is that from many different kinds of materials — from water and gas valves, to doorbells, and even Tinkertoys — we can first — easily — make a universal logic gate — and then by just organizing these in different ways we can make a programmable computer that do anything any computer can do including looking at and writing its own programs.

A fun part is that the analysis of living stuff indicates it is all composed of a few kinds of atoms. Molecular biology over the last 100 years or so has found out many (most) of the organizations of atoms needed to make the different kinds of molecules needed for life. “What Life is” is quite well understood.

There are many useful analogies to the much easier to understand levels of architecture of computer stuff that give rise to very complex systems (and systems of systems) etc. This is worth understanding, and then taking a look at the molecular mechanisms that even simple life today uses. There is a big jump in scale from large non-living molecules and what we call “life” (but no new ingredients).

The breathtaking part is how the atoms available combined with various energy sources and non-living matter might have bootstrapped structures that could reproduce just well enough to be subject to the further random accidents and stress-that-filters of the environment which is what “evolution” is mostly about.

I say might because where things are right now are still not strong enough to induce from scratch. It’s important to realize that this is possible but just how probable is not understood at this point. Part of the difficulty is that the earliest life forms would almost certainly not be DNA-based, and would lack many other characteristics we are used to seeing in life forms today. So the search is really for something that would work at all, that could then randomly change, sometimes into something more capable and efficient.

There are a number of very good people who have been looking that these questions over many decades. One of my favorites is the Nobel Prize winner Jack W. Szostak - Wikipedia Take a look at his work (and he also has some excellent exploratory videos on YouTube).

Have screenshots of the Xerox Parc GUI ever been published?

Sure, here’s one of Smalltalk-80 and here’s a much earlier one from Smalltalk-74 Here’s an image of Smalltalk-76 from a paper Dan Ingalls wrote in 1978.

Here’s a Dorado screen with one of the Smalltalks (maybe Smalltalk-78) . There are quite a few of these to be found in Google images.

[1] Redirect Notice

[2] Redirect Notice

[3] Redirect Notice

[4] Redirect Notice

Why haven't we been able to recreate Bell Labs or Xerox PARC today?

The computer parts of Parc are best thought of as parts of the ARPA-IPTO research community, were stocked with researchers who carried the ARPA vision, and who learned their stuff and process in the ARPA community. Looking back on that experience, and at other research setups over the last 50+ years or so, I think that “the goodness of the results correlates most strongly with the goodness of the funders”.

For ARPA, it was a combination of a great, good, compelling, and romantic vision of a “destiny”, combined with a number of principles (I’ve written about these on Quora before), and especially the idea of finding the best people to try to find ways to realize the vision.

Bell Labs was a different setup and process, but a good overlap was indicated in a sign that used to be in many places in Murray Hill: “Either do something very useful or very beautiful”.

I think the chances of a few top researchers being around in any era are good, but it has certainly been the case that “great funders” are -not- around in every era.

Much of “management” is (a) about dealing with people who are not self-driven to -find-out- and (b) to strive to carry out plans successfully and on deadline.

On the other hand, it’s the nature of “invention level research” that only some of the process can be planned. In my experience great researchers are quite self-driven to -find-out- (they don’t need much management), but they do need open-ended time (and a little space and resources).

Good funding for research is rather like “MacArthur grants for groups” — researchers are a kind of artist — and what doesn’t work out is not failure, but overhead for what does. If the visions were lofty, then everything is lifted by orders of magnitude by what does work out.

This very different process makes most “command and control” management systems in business and government today quite nervous and anxious. I think they would much rather feel completely in control of mediocre processes than feel out of control with truly great processes.

To make this explanation more complete (or more mysterious) we need to mention top scientific research institutions — such as the Santa Fe Institute — and note that they do exist today, and are organized very much as I have described Xerox Parc. Just last week appeared a magisterial book for the intelligent public — SCALE — by Geoffrey West of the Sante Fe Institute. Amongst the many important expositions of the central subject, there is also an Afterword, which contains a section on how the Santa Fe institute was set up, the general nature of its processes, and some of the influences on its design.

The mysterious part seems to have something to do with “real science” and what much of computing seems to be about today, and what investors in computing are interested in. Parc was very much a “computer science” research institution in the old meaning of the term (where the word “science” was taken to be like the existing serious sciences). Perhaps what has lapsed in the field since then is part of the answer to this most puzzling question.

What's Alan Kay’s opinion of the D programming language?

I don’t think I know enough about D to have a supported opinion. I’m not a fan of C or C++ (but not because they aren’t like Smalltalk …)

How did Ivan Sutherland create Sketchpad? What programming language did he use?

Take a look at Ivan’s “Commemorative Lecture” for the Kyoto Prize for some insight into his special mind. Everyone should read Ivan’s thesis — Sketchpad, A Man-Machine Communication System (1963) — one of the greats of all time, and it also answers your question.

Most programming in 1962 was done in machine code, despite a few somewhat higher level languages starting to work (e.g. Fortran and the very recent Algol). However, these languages were mostly to clean up the kinds of programming done in the 50s, and did not offer much to help build “semantic structures”. COBOL had a few more ideas but was tied very much to “unit records” and was quite ungainly to program in.

Most of the good programming in “assembly code” back then was done in macro-assemblers, some of considerable power along a variety of dimensions (for example IBM’s “Autocoder” allowed conditional expansion of macros to optimize general routines for specific kinds of parameters).

The underlying structures of Sketchpad came primarily from what was sometimes called “data-driven” programming (where pointers to subroutines useful to handle the data were embedded in the data). The MIT versions of this can be traced to Doug Ross, who had done some of the earliest computer graphic renderings (such as bouncing balls, etc.) at the Whirlwind project in the 50s, and was instrumental in the APT projects (Automatically Programmed Tools) which required quite a bit of structured graphics to represent the 2D and 3D shapes. This and the programming difficulties led Ross and his group to develop (Algol Extended for Design - AED).

Meanwhile, Lincoln Labs had the very large TX-2 computer with an architecture that allowed easy multiprocessing — via multiple program counters etc. Among other things this allowed a programmer to make I/O gear with a minimum of hardware, most of which would be simulated via programs on the program counters. (We used this great idea heavily later for the Parc Alto.)

The TX-2 had a large oscilloscope screen and an instruction that would plot a point on it. This took 20 bits of the 36 bit TX-2 instruction. More of the bits were used to hold an address to jump to if the light pen saw the dot that was just plotted. In Sketchpad, about 1/2 of the enormous machine’s memory and capacity was used to hold the instructions that generated the dots and carried the back pointers for the light pen (these would be shuffled to “twinkle” instead of the more distracting “swooping” when in order). All graphics beyond the point plot was done by software by Ivan.

The programming of Sketchpad was done in assembly code and soon the macro assembly language called CORAL (Class Oriented Ring Associated Language), which was primarily the work of Ivan’s office mate at Lincoln, Larry Roberts (who later was the director of ARPA-IPTO and was the force behind the actual building of the ARPAnet).

One of the most interesting properties of this language was the support it had for “threaded lists” that were used to hold collections with two way relations. So for example, one would like to have a collection of all objects, but when encountering an object one would like to find out what collections it belongs to. For example, the light pen can see a point, the point knows which line it is in, the line knows which construct it is in, the construct knows what construct it is in, etc.

Threaded lists were a very tricky way to do this, and it wasn’t too long before other ways were tried (note that it is still an issue in most programming languages today which lack associative functionality).

The other property that CORAL made easier was getting to the embedded procedure pointers — for example, the pointer to the procedure that could draw the object on the screen was located at the same offset from the front of the “n-component-element” (as Ross called the storage structures). I think you can see that this anticipated most of what the “not-quite-object-oriented-languages” like Simula (and later C++) were about — what was lacking was messaging and encapsulation — but the core ideas, especially in Sketchpad, led to the more general “real object” notions. (Sketchpad and Simula cannot be praised too highly.)

Finally, there was the problem of dynamic constraint solving. Some constraints were easy to solve (e.g. this endpoint of a line is connected to that endpoint of another line), some could be solved by solving linear equations (e.g. using Gaussian elimination), and some were n-dimensional non-linear, and were handled by a “relaxation algorithm” Ivan got from an old book from the turn of the 20th century. This did n-dimensional hill climbing to minimize overall errors — this is a wonderful “engineering solution” where “close tolerance” was good enough (if something is off by a pixel it doesn’t matter, etc). The particular way Ivan chose to do the solving would generate the intermediate solutions — the drawings would crawl to the solved states — and if a system was underconstrained (such as a crankshaft) Sketchpad would generate all the solutions (and this would animate the crankshaft).

Pretty staggering for less than one year’s work on a PhD thesis isn’t it?!!!

Now go read his thesis!

[1] HOPL

What did Claude Shannon do and accomplish according to Alan Kay?

The Wikipedia article is a good place to start, and supplies references for reading by and about Claude Shannon, in my opinion a “real genius” who ranked with just a few people in the 20th century for unique insights and ability to go to great depths with them.

That “Boolean Algebra” from the 19th century could be used to model and manipulate switching problems — including computation — formed perhaps the most wonderful Masters degree (from MIT) in the 20th century. The insight that this might be possible apparently first occurred to Charles Sanders Peirce in the 1800s, but got lost, and Shannon took the possibilities very deep.

Getting “signal” back from “signal mixed with noise” was perhaps Shannon’s most profound contribution, and his mathematical treatment of the general idea and its problems has had great impact both philosophically (it is a twin sibling of “entropy” in the physical world) and pragmatically (virtually all methods for transmission of information used today own much to the general ideas and their implications). Shannon was able to create definitions of “information” (and founded the field of “Information Theory”) that are fundamental.

There are many useful perspectives on these ideas. A good start is to read the book by Shannon which has a simpler introduction by Warren Weaver: The Mathematical Theory of Communication: Claude E. Shannon, Warren Weaver: 9780252725487: Amazon.com: Books

My favorite way to think about this is from a constructive point of view. One way to construe the result is: If you can state the signal to noise ratio, no matter how bad, you are guaranteed to be able to make a system that can recover the signal from the noise. (Where “system” is not just for transmitting information, but really entails any kind of machinery.)

This result is against most of our intuitions, and I remember being just knocked over by the coolness and profundity of it when I first went through the argument as a teenager.

When did Alan Kay first meet Douglas Engelbart? Was Alan Kay present at the mother of all demos?

I first met Doug when he gave a talk at Utah in early 1967. As I’ve said, he was like a “Moses opening the Red Sea”. I then visited the ARC research project at SRI in Menlo Park and met many of the principal characters in the revolution they were fomenting: especially Bill English, Jeff Rulifson, Bill Paxton, etc.

I got to use the system, and also observe some of the prep for the “mother of all demos”. So I knew what was going to be shown, and I absolutely wanted to see the actual gig. I had the flu at the time but went anyway.

It is hard to convey to people the actual scope of the demo: it was done in a very large auditorium, and it used one of the few kinds of video projectors available (borrowed from NASA) — that could do very large situation displays with high resolution video. It was great! The video of the presentation doesn’t provide the sense of “large” that was one of the most striking things about this presentation.

The “mother of all demos” video and the ARC papers can be found by typing “Engelbart” into Google …

---

I see there is now a chatBOT answer to this question — and it is pretty much all wrong. (Quora, why do something this poorly thought through?)

For Alan Kay, what was Claude Shannon like?

I didn’t know Claude Shannon (but among other things he was Ivan Sutherland’s advisor at MIT (in part because he met the Sutherland brothers a number of times when they were youngsters when he was at Bell Labs). Marvin Minsky was a good friend of Shannon’s.

I know that he like to ride his unicycle around MIT, and also was very good at making complex things with his hands.

A better question for Quora might be: “What did Claude Shannon do and accomplish”?

How many of the great engineers from Xerox PARC came from backgrounds with low family incomes?

I don’t know (and most of the researchers at Parc were a mix of “scientists, engineers, mathematicians, computerists”.

And I’m not sure what you mean by “low family incomes” (desperate? very little surplus?). I wound up working my way through college for a variety of reasons, including “not a lot of surplus” in our family.

Alan Kay, how should libraries be designed so scientists and engineers can work together more?

This “isn’t quite a question” so here is “not quite an answer” — Doug Engelbart, who was a great thinker about the possibilities for helping humanity with the aid of shared interactive computing, wrote much and showed much about needs and ways to boost collaborative endeavors partly through shared “libraries” of knowledge.

Try typing “Engelbart” into Google and follow the trail

Alan Kay: How do you identify favorable exponentials?

I think this question happened because I used “Moore’s Law” as an example of a “favorable exponential” that could be used to help think about “progress by going out into the future and bringing it back”.

Most of the time I’ve first noted exponentials, and then tried to see what leverage orders of magnitude out would manifest. One of the simplest is the cumulative effect of scientific knowledge, which has been both exponential and favorable. The one that makes Moore’s Law work — and many other things — is the reach and power of engineering at all levels of scale from nano on up. Quite a few of what’s been newly doable just in the last 20 years just staggers me (for example, when I was studying molecular biology in the early sixties, gene sequencing seemed likely to be intractable …).

The combination and intertwining of the science and engineering exponentials is “hyper-exponential”.

Why was Bob Taylor fired from Xerox in 1983?

There were large “differences” with the new head of Parc. I had already left by then and at that time was Chief Scientist of Atari.

For Alan Kay, if the present is distracting, then how should you optimally ignore it? What did the team do at Xerox PARC?

By coincidence, I got asked to do several lectures (this week) at Stanford on “Escaping The Tyranny Of The Present”. I think the one from Tuesday just got posted (and perhaps even the one from today).

Take a look at: How to Invent the Future I - Alan Kay

What did JCR Licklider mean when he said if the internet succeeds our biggest problem will be to communicate with aliens?

Lick basically meant that, as scaling increases, the chances to have the shared contexts that communication requires get thinner and scarcer.

This is true between humans and humans (at some point you have to stop trying to send telegrams and send an ambassador who can negotiate meanings), computers and computers, software and softare, and humans with computers and software.

All combinations start to be "communicating with aliens".

Somewhere else in Quora I gave some other examples e.g. of what “objects that can search & negotiate” could do to resolve lack of specific knowledge about APIs and object interfaces.

In the end we were able to give Lick his "Intergalactic Network" but despite work at Parc, UCLA, MIT, etc. in the 70s and 80s we didn't manage to pull off solutions to the general problems. These are still open and more needed than ever.

What do classical music lovers think of rap?

I really enjoyed all of the other answers — perhaps my favorite collection of responses to date on Quora.

Part of my background is 9 years of being a professional jazz musician (on guitar, starting in the late 50s), composing experience, many decades to date as an amateur classical pipe organist, and 2019 to the present as a Renaissance and baroque lutenist.

One of the prime characteristics of serious art is to wake us up to pay attention to something important we’ve been missing. From that standpoint, I’d say that some rap is Art.

There are so many kinds of music in the world that we have to think about it as “organized noise” (including “no-noise”, as in John Cage’s “4′33″). From this view, we have to view rap as Music.

At the next level we can look at the contrasts between “pop music” and “developed music” (which would include both classical and jazz). There’s not a lot of development in either the words or the other sounds in rap (but there is some).

But the Zappa view (which is expressed in another answer) that rap is a form of American poetry, seems really good to me. Another answer, which also looked at this from the point of view of Homeric and other bards singing or semi-singing their lyrics as part of the same genre as rap, seems quite on point.

And — there is “form” and there is “content”, and they aren’t completely separable. Form without Content is like Technique without Art — these are chilly and worrisome.

I like music that simultaneously affects us directly without analysis for one part of our brains and has nuances within nuances to be savored by other parts of our brains. Both going on at once is what makes life worth living. Both require some learning, and the second part can require a lot of learning and training (some really requires being able to play the music to touch more of it).

I think rap could develop into an artform like this — this would take much of it out of the popular highly accessible place it is now. What I’ve heard so far isn’t there yet.

This development of a pop form has happened with other music that started out as pop — not just jazz, but some rock and roll.

Perhaps the big question is will most who like rap now be interested in getting more sophisticated and developed as the art form develops?

---

“Hamilton” is a very interesting good example of using rap in a manner that resembles golden age Greek drama …

What is the most expressive piece of art that you have ever seen?

It’s worth asking to what extent “art” can be sorted, even by a single person. I think the latter can at least create a category called “great” and include works in it (this is harder for a group to do, but it’s possible for them to agree on some to be put in as “great”).

On the other hand, I don’t think it’s really possible to sort to a single “most X” — e.g. partly via what? And partly because a goodly portion of expressiveness has to do with the transient moods of the observer.

So I don’t think we can do much sorting between e.g. Bach and Beethoven, Michelangelo and Rodin, Raphael and Vermeer, etc.

Why does imperfect computation have market power?

Why does imperfect music — even bad music — have market power?

What's the point if we’re all going to die in the end?

The point of life is to live life with a point. We are the creatures who are aware that we’ll die, but we also are the creatures who create our own reasons for living. Don’t worry about the universe. Find a point.

Why is squaring both sides of an equation allowed?

Let me confess that I’ve not read every answer. But the ones that I did read were all very concerned with “squaring” etc.

The simplest answer — and I think the reason many people have difficulty with both arithmetic and especially algebra — is that you need to deeply internalize just what the “=” sign symbolizes and asserts: that there is the very same number on each side.

In other words don’t be distracted by the symbols and operations. One way to think about this is that “a number is all the ways you can make it” (i.e. it can be thought of as “processes” (an infinite number of them) as well as a “value”).

This means whatever you can do to any number can be done on both sides of the “=” because there is just the same number underneath the gobblydegook on both sides.

This is what “=” actually means. And it’s why algebra is actually quite easy rather than mysterious or difficult.

What is the user interface of next generation messenger?

Hi Jenifa

Could you try asking your question in a different way? This will help me understand what you would like to know …

What are the best books to understand computer science concepts?

By far the best book for the general public — the one I always recommend — is “The Pattern on the Stone” by Danny Hillis.

I have several thousand books about computation, but I can’t think of a great next level book. There must be one by now, so it will be interesting to see what other people who answer this question have to say. Meanwhile, I’ll poke around and see what I can find for you.

One way to approach this is as an analogy to atoms -> molecules -> life (physics, chemistry, biology). The easiest things to learn about computing are at the atomic and simple chemistry level, both in terms of organizing simple “switchable switches” to do more complex things, and also via simple programming.

The key is to find ways to make the jump from “chemistry of large structures” to “life” — that is, to qualitatively different and more profound organizations of the basic materials.

There is a fair amount of evidence that many of today’s computerists don’t find their way to this. But many of the most interesting things in computation — such as the Internet, etc — use organizational principles that are much more like “biology” than “chemistry”.

---

Added Dec 2020, thanks to Robert Jain’s comment below: Hi Alan, what do you think about The Elements of Computing Systems - Building a Modern Computer from First Principles by Noam Nisan and Code by Charles Petzold ? Kindly revert.

I’ve written about the Nisan and Schocken book in other Quora answers. Basically: it can’t be too highly praised for the aim, reach, and effort.

Personally, I would do a few things differently — not as a complaint, but just from a different perspective — especially what kind of software to manifest on the HW, and perhaps a slightly different HW design, perhaps with completely concrete first principles (for example, like the start of the “Code” book.

One of the many great strengths of N&S is that they furnish excellent simulators for every stage of the building process from first principles. And this means that one can learn by doing and debugging every step of the way.

The Code book basically uses a very good approach for “principles” — I would take this path also — but I feel that it has way too many words and a few too many side trips for a starter book. That said, I think doing the first principles part of this journey using batteries, bulbs, and relays as Petzold did is exactly the right way to go.

A large part of “User Interface” — on computers, or just writing or teaching — has to deal successfully with the limited number of ideas we can hold in our minds simultaneously (for simple things 7±2, more generally more like 4±3). Danny does this beautifully in his short clear book. “Code” violates this I think, so it works better as a secondary reference. The N&S book is large but rewards effort by being organized into careful sections that do respect our cognitive limitations.

I think there could be an interesting and rewarding organization of the best methods of each of these books.

If an alien civilisation came to Earth and found computers, how long would they take to figure out exactly how they work (reverse engineer them)?

I agree with Vivian (I was just in the process of formulating something similar, but she said it shorter and better).

On the other hand — not quite an answer — every computer person should be aware of what the Silverman brothers have done in making old hardware architectures work again. One of their most remarkable feats was to reverse engineer a 6502 CPU chip and to simulate it in several illuminating ways.Digging into Technology's Past

If you want to program and you can't use your smartphone and computer how you can code?

Programming is relatively easy, it’s debugging that’s the problem. Or as Tony Hoare once noted “Programming is easier than debugging, so don’t use all your cleverness writing the program!”

I started in the early 60s (in the Air Force), on two different main frames which ran batch jobs 24 hours a day. Virtually all programs were written in assembly code. This was quite common almost everywhere.

Programmers were the cheap resource — in the AF, they were enlisted ranks — and golden machine time was not “wasted” on us. So once a day we could get one of the computers and an operator (we couldn’t touch the HW) for 3 minutes to run the program a few times and to get a binary memory printout of the after battle carnage plus register contents.

The remedy was to “desk-check” and to write and use tracing macros. The results were very few bugs by today’s standards (at the cost of wakeups in the middle of the night when your “computer brain” had found a bug in your code).

There’s a wonderful true Don Knuth story that happened because he learned to program at the same time and in the same fashion…..

This style of really thinking the code through is actually good to learn how to do even with today’s interactive conveniences (it is analogous to still learning how to remember what you read despite having the written record — it makes enormous differences in both efficiency and reach).

Finally, as suggested in some of the other answers, you can get computing power today at the level of our wildest dreams in the early 60s for much less than a pair of most shoes. Make the investment, and then train your mind to think and do in terms of processes.

What does the expression "Betcha by Golly Wow" mean?

“Betcha” comes from “you betcha” which comes from “you bet” which means “YES!”.

“By golly” is one of many ways to say “By god” if you want to avoid really swearing.

“Wow” means “Holy shit!” means “that’s amazing”

The whole phrase is an hyperbolic exclamation.

How is studying undergraduate computer science at CU Boulder?

They didn’t have it when I was there (graduated in 66 with a math degree)

How can I come up with an ISEF project relating to Computer Science?

I’m not a fan of competitions where art is a big factor (and it should be here, not just for your learning and doing, but also to carry with you later).

And … it’s usually much much better to do things that really intrigue you that also push into computing “just because”. When something good happens, you might also have a candidate for a science fair.

An experience that made a big impression on me when I was still a student but a bit older than you, was to come in contact with some of the fundamental relationships that have to do with programming. In my case, there were a number of these — two biggies were “Meta II” by Val Schorre, and Lisp 1.5 (page 13) by John McCarthy. Both of these were tiny (~ 30 lines of code) self-definitions of systems that had an enormous amount of expressive power (the former was a self-bootstrapping compiler-compiler, the latter was “The Maxwell’s Equations of Programming” in a tiny, cosmically powerful programming language).

The process is to understand the self-definition, and then re-write it in some recursive language (could be something like Python, etc.) and this would produce the system like magic. (In practice there is a little more code needed around the kernel, but not a lot.)

You should be able to find both of these references online (I found them for you (below)). Both of them are good science fair projects because they require some work and also a good explanation/poster by the student.

They are also a gateway into the kinds of advanced thinking about computing that most people who learn programming never get to.

Good Luck.

John McCarthy et al Lisp 1.5 Manual (ca 1962)

http://www.softwarepreservation.org/projects/LISP/book/LISP%201.5%20Programmers%20Manual.pdf

Val Schorre Meta II Paper (1964)

https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0ahUKEwij-JGiz4jTAhVB6GMKHeSfD4gQFggnMAE&url=http%3A%2F%2Fwww.ibm-1401.info%2FMeta-II-schorre.pdf&usg=AFQjCNGjTqNju0YHmgWkGAz1ru8zreMlFw

Will graphical user interfaces ever be taken over by voice user interfaces?

Will classical pipe organs ever be supplanted by singers (or vice versa)?

Will people who like to make things with tools ever want to have servants do the best work?

What made Xerox PARC special? Who else today is like them?

A good book (pretty much the only good book) to read about the research community that Parc was a part of is “The Dream Machine” by Mitchell Waldrop. There you will find out about the ARPA (before the “D”) IPTO (Information Processing Techniques Office) set up in 1962 by the visionary JCR Licklider, who created a research community of 15 or 16 “projects”, mostly at universities, but also a few at places like RAND Corp, Lincoln Labs, Mitre, BBN, SDC, etc.

There was a vision: “The destiny of computers is to become interactive intellectual amplifiers for everyone in the world pervasively networked worldwide”.

A few principles:

Visions not goals

Fund people not projects — the scientists find the problems not the funders. So, for many reasons, you have to have the best researchers.

Problem Finding — not just Problem Solving

Milestones not deadlines

It’s “baseball” not “golf” — batting .350 is very good in a high aspiration high risk area. Not getting a hit is not failure but the overhead for getting hits. (As in baseball, an “error” is failing to pull off something that is technically feasible.)

It’s about shaping “computer stuff” to human ends per the vision. Much of the time this required the researchers to design and build pretty much everything, including much of the hardware — including a variety of mainframes — and virtually all of the software needed (including OSs and programming languages, etc.). Many of the ARPA researchers were quite fluent in both HW and SW (though usually better at one than the other). This made for a pretty homogeneous computing culture and great synergy in most projects.

The above goes against the commonsense idea that “computer people should not try to make their own tools (because of the infinite Turing Tarpit that results)”. The ARPA idea was a second order notion: “if you can make your own tools, HW and SW, then you must!”. The idea was that if you are going to take on big important and new problems then you just have to develop the chops to pull off all needed tools, partly because of what “new” really means, and partly because trying to do workarounds of vendor stuff that is in the wrong paradigm will kill the research thinking.

An important part of the research results are researchers. This extends the “baseball” idea to human development. The grad schools, especially, generally admitted people who “seemed interesting” and judgements weren’t made until a few years down the road. Many of the researchers who ultimately solved most of the many problems of personal computing and networking were created by the ARPA community.

Parc was the last of these “ARPA Projects” to be created, and because of funding changes from the Vietnam war, got its funding from a corporation rather than from ARPA-IPTO. But pretty much all of the computer people at Parc had grown up in ARPA projects in the 60s, and Bob Taylor, who set up the computing research at Parc, had been the 3rd director of ARPA-IPTO.

Bob’s goal was to “Realize The ARPA Dream”.

Parc was highly concentrated with regard to wealth of talents, abilities, vision, confidence, and cooperation. There was no real management structure, so things were organized to allow researchers to “suggest” and “commit” and “decommit” in a more or less orderly fashion.

Quite a lot of the inventions Parc is most known for were done in the first 5 years by a rather small pool of researchers (Butler Lampson estimates about 25 people, and that seems about right).

One of the most interesting ideas at Parc was: “every invention has to be engineered for 100 users”. So if you do a programming language or a DTP word processor, etc, it has to be documented for and usable by 100 people. If you make a personal computer, you have to be able to make 100 of them. If an Ethernet, it has to connect to 100 devices, etc.

There was no software religion. Everyone made the languages and OSs and apps, etc that they felt would advance their research.

Hardware was trickier because of the time and costs needed for replication and doing and making new designs. In practice this worked out pretty easily most of the time — via not too many meetings — and the powers of HW geniuses like Chuck Thacker. A few things — like the disk sectors and simple Ethernet protocols, etc. — were agreed on, mainly to allow more important things to be done more idiosyncratically. In practice, Parc designed and put in the field a variety of Alto designs (about 2000 Altos were built), MAXCs, Dolphins, Dorados, NoteTakers, Dandelions, etc over a period of about 10 years — i.e. quite a lot.

There were key figures. For example, Parc would not have succeeded without Bob Taylor, Butler Lampson, Chuck Thacker, and a few others.

I would call the first 5 years “effectively idyllic”. And the second 5 years “very productive but gradually erosive” (the latter due to Xerox’s many changes of management, and not being able to grapple with either the future, or a possible grand destiny for the company).

Are Smalltalk and Pharo out-dated?

Sure. What’s disappointing is that Smalltalk is still quite comparable to most programming languages in use today (and not always negatively).

This means that the computing establishment has done a terrible job in coming up with something qualitatively better over more than 40 years.

---

Added: October 27, 2020.

In a conscious analogy to Lisp, Smalltalk is made from just a few ideas with as much of the language as possible as “library”. Both languages have enough of a reflective meta-structure to allow many different pathways from the kernel.

So, a simple way to improve things using Smalltalk would be just to completely rewrite the library from scratch. Dan Ingalls used to do this every few years at Parc, and we did quite a bit — though not enough — when Squeak was done in the 90s.

One of the human “cognitive biases” is “loss aversion” and this ties in with others such as “investment value” (the time put into something makes it worth more), etc. There is also plain laziness, etc. All these have made the history of Smalltalk after Parc completely different from how we used Smalltalk within Parc.

But the 40 years since 1980, with the immense scalings and other happenings that have taken place — really demand a deeper set of designs, some of which affect the underlying semantics, especially for message passing.

I’ve written and talked about some of these elsewhere, so won’t iterate here. People who consider themselves to be computerists and who are interested in languages, should take a look at what languages need to be able to do. If they did, they could not just answer the question, but also start to provide some of the new problems to be solved and the start of solutions for them.

People who are really serious about software should make their own hardware. Why?

The first part of the idea is that computing is about -processes- (all kinds) both understanding them and making them.

At the next level of practicality, if one is making something that is supposed to be good for people to use — that actually might help them in important ways — then the design needs to be in terms of humans-with-processes, and shouldn’t be limited by the particular hardware (and programming languages and systems) that vendors might be supplying.

Finally, a new good and needed idea might not run fast enough or simply enough on existing hardware/software systems.

In all of these cases, computer people should be able to deal with all the levels of organization needed to create the desired art.

This is how personal computers, bit-mapped screens, the Ethernet, the Internet, smartphones, microcode, FPGAs, and (going way back) programmable computers got invented.

How can I understand physics instead of just memorizing equations and facts?It really frustrates me how I only have to memorize equations and just work my way through school without understanding anything.

There are lots of good approaches here. I’d start with thinking about -processes- rather than “relationships” or “words” — and combine this with some experiments (a great thing about the dawn of “real science” in the 17th through 19th century is that a lot of the experiments were rather simple, and done without a lot of equipment).

In this example, you can also use some analogies to more intuitive human-scale processes, such as: “pressure” of “something” for voltage, flow of “something” for current, different kinds of “pipes” that can impede or help the “flow”, and so forth. These can be very useful if you keep in mind that analogies need to be suspected, even as you are using them.

Even though hardware stores aren’t what they used to be, you can derive Ohm’s law for yourself with a cheap battery e.g. a 9v, some insulated wire, and a cheap meter (you can get them for around $10 that will measure voltage, current, and resistance). For example, Amazon.com: Alloet Heavy-Duty Digital Multimeter Handheld Meter Voltmeter Ammeter Ohmmeter Capacitors Farads: Industrial & Scientific

Richard Feynman once said “Science means you don’t have to trust the experts”, and this is true for many of the fundamentals. If your school isn’t doing science via getting students to -start- with the experiments and then be guided to derive the formulas, then this is not really a science class, but more like a religion class, and I’d complain like mad! (You could also look at a great old high school physics curriculum done in the late 50s and 60s by top physicists — PSSC Physics — to find a lot of good ways to get into science via actually being a scientist and doing science.)

What does a good object oriented design for a football game look like?

I left inheritance out of the first Smalltalk because I didn’t like the way Simula’s single hierarchies missed important relationships and I didn’t like the lack of expression of meaning (you could refine a superclass or create a new species, etc., and Simula didn’t care). I didn’t like the inheritance system that eventually got put into Smalltalk (too much like Simula’s).

However, being able to make instances from a general description is hugely powerful, and if your language has some of the dynamics of Lisp (Smalltalk did) you can organize further relationships as understanding deepens. For example, a good question to ask about any language is “how easy is it to make real objects in this language?” — if it claims to be an OOL, then “how easy is it to do (say) ‘slot inheritance’ ?” (i.e. inheritance through the property names rather than through superclasses). And: are variables themselves objects in the language, etc.

So, a good first pass for any system is to first identify “ideas” which will have multiple instances and take this as your list of first classes (or prototypes, etc.).

I like to make a scratch system when I’m designing, even if the language has lots of stuff in it, because “real thinking is hard” and I want to do -real thinking- when I design. The scratch system will usually bring up head-scratchers that need to be resolved.

At the next level, note that most interesting systems require something more like an ontology that a set of relationships and is only partly hierarchical — Minsky and Papert called system non-hierarchies “hetarchies”.

Another thing to note is that objects in the real world have “roles” (a person can be a father, a son, a brother, a boss, an employee, etc.) This suggests more interesting ways of looking at objects. The “properties” of objects can include behaviors, so they are not just variables.

And it’s worth thinking about objects being partly made by “sideways compositions” as cooperting aggregates of simpler objects (this implies that objects are themselves namespaces. Many things that seem to require multiple inheritance can be more easily and cleanly described “sideways”. (This is kind of a parametric approach to thinking of complex entities — which is often simpler in important ways.)

And that “objects need to be made from systems of objects that are made from systems of objects … “ (i.e. instead of the inside being a list of variables and methods.) Note that in standard languages you can put objects into variables but these objects cannot see each other. This allows objects to be and to be used as real modules in a recursive fashion.

But — back to the beginning here — don’t forget to start with a dog simple scratch system of just instances of relatively simple classes, and think of the more consolidated descriptions as being partially optimizations possible for later. Then think about ontologies instead of hierarchies. Then be able to criticize the language you are in if it doesn’t have enough of a meta-system to allow you build what you need as you understand more and more. (Don’t try to be tricky: Tony Hoare likes to say that “Debugging is harder than programming, so don’t use all your cleverness to write the program!”)

What are the benefits of learning to programme in Smalltalk?

Take a look at In what way would learning Smalltalk make me a better programmer? for some opinions around a very similar question.

In what way would learning Smalltalk make me a better programmer?

I think the crux of a question like this is to note the old saying that “You can do COBOL programming in any programming language!” In other words, the start is what you bring to a programming language. If it’s an already formed theory and skills about how you program, then you’ll recapitulate these in every language you try.

Another way to think of the question is via metaphorical analogies to atoms, molecules, chemistry, biology, ecology, etc. Computers can carry out “systems of processes”, and what we want to ask is “What is a system?”, “What is a process?”, and “How can we make them?” Keeping those questions in mind helps us gauge how well our programming languages are helping us or getting in the way. “Biology and beyond” is where computing needs to be, so this is a good way to gauge where a program/system is.

If we are lucky, our programming language will have “high expression” in the system we are trying to make (this is rare!). If we are unlucky, we will think the kind of expression our programming language has does fit our goal, even when it doesn’t. This can lead to much of the awful code that we see everywhere today. If we have learned something about design, we will try to understand the best ways to characterize our goal, and we will both fit, and invent and make, structures in our language to model our goal.

One way to criticize programming languages in general is to compare to “human languages” (like the English I’m using to write this) by noting that there is enough “stuff” in English to allow me to talk about many things without having to make up a new term, whereas most programming languages are more like pidgin languages with almost no vocabulary, and the adding of new vocabulary is a difficult task for everyone (what does the new term mean?) and every -thing- (is the new definition well drawn enough?). Etc.

On the other hand, the flexibility of English requires considerable learned skills to make larger structures that are understandable. In the end, design and style are the keys to making something worthwhile. In the “old days” (the old fogey remembering) super high skills and a good macro-assembler allowed lots of good stuff to be done in machine codes. However, this was a kind of brinksmanship that didn’t scale well (and it certainly got me to start trying to invent higher level languages that were much safer, more expressive, but still had real flexibility at all levels.

All of this means the programmer not only has to design processes, but has to design languages. Both of these are quite difficult, in part because we don’t even know how to do either very well. We can compare this to the great difficulties in user interface design, and note that a programming language -is- a user interface, and that a program -is- a user interface (and both are usually quite terrible in these terms!).

At the next level, most languages offer “features”, often in the form of a library or surrounding system. These features can sometimes be useful (a system for doing computer graphics or user interface, etc,) but also have the problem of (a) being tempting by already being there, and (b) being intransigent to changes that are needed for the new goal. What is the cost of bypassing either or both of these? (Note that this extends to the so-called “operating system” (which should be seen as a rather bad idea, but most think of having one as “normal”).

The way I’d answer your question is that the path towards better programming lies in becoming a better designer, and the ability of a programming language to create what is needed to realize your “better designs” is a good way to assess it.

From this standpoint, Smalltalk is very much a creature of its time. A good way to get a sense of what this means is to take a look at a resuscitation of one of the Xerox Parc Smalltalks (from 1978). This was from a rescued disk pack that Xerox had put in the trash which happened to have an image of this system. Recently Dan Ingalls and Bert Freudenberg did most of the heavy lifting to get it going again, and I used it to make a presentation for Ted Nelson’s 70th birthday festschrift. Here is that 15 minute or so presentation: Alan Kay's tribute to Ted Nelson at "Intertwingled" Fest

What you see here is a language/system/environment, etc made for a tiny computer but with “cosmic goals”. (It was this version of Smalltalk that Steve Jobs saw the next year in 1979 during the famous visit to Xerox Parc. You can compare this to what the Mac did and didn’t do years later.)

Almost 40 years later, we can note what we didn’t put in Smalltalk for a variety of reasons. For example, we only had cycles for limited multitasking (the original idea wanted something like what Erlang does do today). We had ideas about a real separation of optimizations from meanings, but didn’t try to do this except for subclassing, which is not a good approach (would have been really nice to have taken a shot at this). We understood that -names- are too local for the kinds of scaling that were going to happen, and that we really needed “programming and finding via descriptions” — but we didn’t do this. And so forth. What I’m driving at here is that — for any system — we have to try to understand it in the contexts that give us the best view of it, and that includes what was really good and also what was not done.

Some of the good things about Smalltalk are noted by the other answerers of your question. But to circle back to the opening paragraphs, you can learn a lot about how to become a better programmer from existing programming languages if you’ve built enough of a design sense to be able to both criticize and also to see what is powerful.

Did some Xerox PARC engineers understand that they had the technology that can change the world in the 70’s?

Sure! Lee Felsenstein is quite right that the Parc researchers — we weren’t all strictly “engineers”, but a mixture of different types who could all do some mixture of engineering, science, mathematics, computing, cognitive psychology, etc. — aimed explicitly at changing the world for the better along the lines put forth by the real pioneers such as Licklider, McCarthy, Minsky, Sutherland, Engelbart, etc.

As I’ve noted in other answers, Parc can only be thought about in the larger context of the ARPA IPTO research projects started in the early 60s by Licklider and subsequently guided by Ivan Sutherland, Bob Taylor, Larry Roberts, etc. “Fumbling The Future” has too many errors to be a good history, but “The Dream Machine” by Mitchell Waldrop is quite accurate enough to glean some of the breathtaking sweep of this research.

Parc came about when the pressures on Congress by the Viet Nam war protests led to an overly sweeping change to how DoD funded research, especially on college campuses — the Mansfield Amendment — and this got Taylor to worry that the larger aims would not get realized. This happened to coincide with Jack Goldman, the Chief Scientist of the hugely successful Xerox corportation, urging Xerox to set up a long range research center. The link was a physicist, George Pake, who had been chancellor of Washington University (St Louis), where there was an ARPA project (so he knew Taylor). Pake became head of the newly set up Parc (in July 1970) and convinced Taylor in the Fall of ‘70 to set up a computer lab there (Taylor hired me as a consultant soon after this).

Taylor cherry-picked relatively recent young PhDs from the ARPA projects to be the initial researchers (he wanted young researchers who had grown up on “the ARPA dream”).

In the first few weeks of 1971, a critical mass was achieved when Taylor convinced a core group of researchers — including Butler Lampson, Chuck Thacker, Dick Shoup, Charles Simonyi, Jim Mitchell, Ed Fiala, Willie Sue Hoageland, et al to move en masse to Parc. A second wave of similarly talented researchers started to drift over from the Engelbart project in nearby Menlo Park.

This group was so stellar that I decided not to go to CMU to work on the Dynabook but to become a full time researcher and set up a group with the rest of these wonderful characters. We were most definitely all engaged to change the world for the better.

Why does my school aged kid ask inane questions like "what is gravity made of" or what is a "field made of" rather than just learn the material? How do I discourage this so they can get good marks?

If you assert that you are not kidding here, I’ll be happy to answer your question.

What was working at Xerox PARC in the early 1970s like?

Best time ever! A truly wonderful combination of great people with great aspirations, abilities, and determination.

And great (and needed) protection for the critical years by the “impressario” (as he liked to call himself) Bob Taylor.

What is Xerox PARC’s contribution to the world?

A good book to read — which gives the larger story of the ARPA Information Processing Techniques Office (IPTO) research projects starting in 1962 and how Parc was part of this community and budded from it — is “The Dream Machine” by Mitchell Waldrop.

I think the bottom line is that the particular concentration of interests and abilities of the Parc researchers constituted a critical mass to create one of the possible realizations of the “ARPA Dream” of personal computing and worldwide networking by using many of the previous inventions of world-wide computing research over the previous 10 years, and adding a few inventions on the way.

Parc’s output included a workable personal computer powerful enough to deal with all media, a display that could handle arbitrary images, a general user interface that was reasonably learnable by all ages and types of people which could mediate hundreds of thousands of applications, WYSIWYG interactions, “desktop media” authoring and use, a super-reliable-easily-deployable local area network (Ethernet), the first laser printer, the first dynamic completely object-oriented languages, development and operating systems, and hefty contributions to the Internet (Parc had its own “internet” before the official one). There were numerous “other things” as well, but this should be sufficient to answer your question.

If I am auditioning for a Shakespeare play, should I do an English accent for an audition or should I just use my normal voice?

I agree with the other answers that you should concentrate on pulling off a speech before all else. Clarity and rhythm are the watchwords. Getting it over the footlights with “quiet conviction” and not too much movement — but with “body attitudes” — is the key to stage acting.

Is Alan Kay's statement “science is a set of heuristics to get around buggy brains” a reference to Nietzsche's knowledge-as-preservation?

No — but an interesting question. This statement is my translation into modern terms of some of Francis Bacon’s observations and calls to action in his “Novum Organum Scientia” (ca 1620), where among many other things he identifies 4 “idols that humans worship”, of: the Tribe [human genetics and culture], the Cave [the internal thoughts of an individual], the Marketplace [inability of our languages to represent well enough], the Theater [basically: bad ideas promulgated by schools and academia].

He called for new methods — a “new instrumentality” (“Novum Organum”) — to combat these problems that would lead to new and more clear “knowledge” (“Scientia”). In modern terms we can still use “methods” and we can make the meaning more clear by substituting “heuristics”.

Looking at “science” from this point of view, we should be able to readily see that it should be made the basis of any decent system of education in our time.

“The Gay Science” is one of Nietzsche’s books that I haven’t read (but now will). From the wikipedia article link you gave, some of his ideas are very similar to the earlier ones from Bacon. For many such thinkers, it was not until the 20th century that science had gotten to the place where the gap between “what’s out there?” generating phenomena, and our abilities to understand and represent the causes of the phenomena started to look as though it might not be completely bridgeable (a doubly urgent need for heuristics to get around our “bad brains”!)

I really dislike (to put it mildly) the new Quora UI/UX. Both web and mobile. Am I the only one?

Why introduce modes for expanding answers — by making a popup that forces closing to get back — when expanding them in the scrolling galley was the best solution, and that is what you had. This is “reinventing the flat tire”!

Please change it back.

What is it like to work with Adele Goldberg?

I first met Adele when I was putting together my research group at Xerox Parc in its earliest days, and she was a computer scientist (U of Chicago) finishing her PhD working with Pat Suppes at Stanford on an interactive system for helping young children learn mathematical logic. She was deeply interested in, and very able at every aspect of “computing, children, and curricula”.

I managed to convince her that the ARPA/Parc dream of “personal computing” especially “for children of all ages” for “everyone in the world pervasively networked together” was the most important thing we should all work to make happen.

Our research group achieved critical mass — and then some! — when Adele moved over from Stanford to join it.

Adele is not so easy to describe because she combines a level of intelligence, abilities, energy, verve, and vivaciousness that almost no one else has. Anyone who has met her will know what I mean. She is a force! A person who makes things happen.

image

When she was an undergrad, she told a friend she was going to get a PhD in Computing, and when her friend said “But you’ll have to work twice as hard as a man”, Adele replied “Is that all?”

It is also not so easy to describe the dynamics at Parc, especially in our Learning Research Group. I would describe one aspect of it as “a bunch of leaders and individual contributors who somehow listened to each other and sought the opinions of others”. Part of the success had to do with the shared visions — at the highest levels we were all trying to make the same thing happen — and conversely, that there was not a party line as to goals and projects — at the next levels each group and most members of most groups had very individual takes on “most desirable projects”. This resulted in a very wide research coverage of the personal computing and networking landscape.

Adele started to set up our experiments with children, quickly became a major contributor in every aspect of what we were doing, and — to the extent that there was any organization in our group at all — she joined Dan Ingalls and myself as the main “fountains”.

I’m failing at answering the question of “What was it like to work with her?” A weak comment is that it was just really great, and we all got a lot done together. One of my favorite memories is from the time that the top 10 executives of Xerox had been invited out to Parc (ca 1977) to learn about “personal computing”, and our group got the task of teaching them about “modern programming” (and all in 3 hours over 2 days). Smalltalk-76 had just gotten working, and we decided to combine some of our experience with simulations (going all the way back to Simula and several versions in Smalltalk), to see if we could make an end-user programming experience in the domain of simulation that would allow the execs to learn and make simulations that were relevant to them. Adele took the lead on this project and steered the group to make the Smalltalk “Sim-Kit”, a graphically acted out “job-shop” simulator. The execs could make icons for the “things being worked on” and program within the framework to model parameters they were familiar with. The result would be a dynamic portrayal of the simulation. For example: one of the execs ran a printed circuit factory at VariTek and was able to simulate the main line:

image

My never to be forgotten memory was that Adele had her second child during the prep for this, and rigged up a harness so she could nurse the baby while writing Smalltalk code on her Alto! I’d give anything for a photo of this, but the memory is indelible!

And we are all still great friends — and occasional colleagues — Adele is on the advisory board of Viewpoints Research Institute, and we’ve recently been discussing fruitful directions for the new Y Combinator Research initiatives in “Human Advancement” that have recently been set up (thanks mainly to Sam Altman).

As a tribute to Adele, let me just say that in my opinion she cannot be praised too highly, not just for her many seminal technical contributions to both computing and education, and not just for her abilities to make things happen on her own and especially with others, but also for her lifetime of service to her field, which has included being President of the ACM, and a force on many critical committees over the years.

Yay Adele!

How does one develop a really strong intuition for mathematics and physics? How can I improve my ability to solve "really challenging" physics/math problems?

I enjoyed reading all of the other answers. One thing to ponder is that your two questions are not necessarily compatible.

And let’s use a less lofty label than “intuition” — how about “guessing”?

(See Kahneman’s “Thinking: Fast and Slow” for his “expository fictions” of fast non-cognitive difficult to learn pattern matching mechanisms and more cognitive slow faster learning mechanisms.)

For the purposes here, let’s say that most of what is called “intuition” is in the fast non-cognitive realm (it’s reacting to things that are similar to what have been seen before). This is the “practice” part of the advice. The practice is essentially changing and shaping the parts of the brain that do quick “commonsense reasoning” to add some of the “un-commonsense” that underlies math and the physical sciences. But it’s still mostly non-cognitive — so it pays to be careful here.

Hadamard’s book “The Psychology of Invention in the Mathematical Field” is quite revealing. He was a leading mathematician, and towards the end of his life he circulated a survey to his friends — his private list of the 100 greatest mathematicians and scientists of his day — to find out how they thought they did their thing. Most said they didn’t use symbols, but did use visualizations of various kinds, and about 20% (IIRC) including Einstein, said they also had “muscular kinds of sensations”.

Certainly, getting well versed in “visualizing” (not necessarily in “images”, but in terms of “configurations”) will help, especially for things that are well known already.

But …

Since “guessing” can go almost anywhere (this is the problem with “intuition” that Prof. Brewer points out), it’s worth getting skilled at heuristics that can help deal with both sides of your question — meaning: we’d like to (a) be able to “guess” pathways to take for “challenging math/physics problems”, but (b) we’d like to avoid the trap that Prof Brewer mentions — to get wedged into the commonplace — and escape so that we can also be alert to new possibilities.

A really simple exercise that can help a lot is to get out your trusty moleskine notebook and on the left page write down what occurs when you look at a “problem” or “situation” of some kind. This will both capture what your pattern matchers are tantalized with, and it also performs a kind of release that will let you write down on the left page stuff that is as different from the right page as possible.

I write down ideas also — mostly to get away from them: “Most ideas are mediocre down to bad!” To really think, you have to be able to escape from the delight of having an idea. (School is a killer here because so much of it is about answers rather than questions — many people are wiped out because they think they need to come up with an answer …)

Just have lots of ideas, and then put them somewhere that will not interfere with further ideas and directions.

I like the idea that what’s between my ears is not reality but just a construct of one kind or another, so playing with constructs — fictions — can help things quite a lot.

As voice interfaces get better, how will we draw the line between where we use visual UIs vs. voice?

Jared Zimmerman’s answer is good. One thing he didn’t mention — in general, with regard to semi-Is as interface assistants — is trust — and how it can be dealt with continuously.

For example, in a reactive GUI, we learn to trust most cause and effect relationships via having more or less what we hoped for manifest itself, not having our state go away, and by having a very reliable UNDO operation.

For agent-based interfaces we have even less contact between our wishes and the means and possible side-effects of giving tasks to agents. The visual often helps when it is manifested as part of a task (driving directions, simple retrievals, transcriptions, etc.).

In general, it’s worth pondering the difficulties of communicating and working with a human assistant who has much more intelligence and both commonsense and special knowledge and context — and can explain reasoning better — than any AI.

IT scientists often say about other scientists that they don't know how to structure software code. Are there ways to learn it without studying CS?

Most code is not “well structured”, even by experienced programmers (even by experienced “structurers”). The early stages of programming are often more experimental than one would hope, and experienced programmers will sometimes take the trouble to try to re-organize the initial ad hoc structurings.

The languages and development systems in general use are pretty terrible at helping reorgs. And most systems and languages wind up with dependencies that gradually become hidden to the point that reorging is both frightening and prohibitively expensive.

Let’s put “Humans First!” here, and mostly blame the current low state and low levels of programming and programming languages for a lot of the problems at the different levels of scale that span from non-pro-programmers to pro-programmers.

A lot of the original impulses behind the development of “higher-level languages” — often called “problem oriented languages” (POLs) in the 60s — was the desire to “program in problem terms” and not have to worry about every detail of method, especially with regard to optimization.

The very early FORTRAN had a foot in both worlds. It was hugely optimized to run well on late 50s and early 60s computers, but it almost only had one trick (represented by its name of FORmula TRANslation), it was almost assembly code in other respects.This led to spaghetti code for almost all programmers.

A very early (1962) beautiful example for simple problems of interest to scientists was the first great interactive language JOSS. A slightly later, much more comprehensive POL was APL (a working system appeared in the late 60s).

Two interesting systems today that can be useful for scientists are the venerable but capable Mathematica (a tool for thinking and modeling), and the relative newcomer, Julia, which attempts to be an efficient dynamic language that uses more or less standard expression syntax.

Putting "Humans First!" again, we could ask what do scientists actually need? for (a) what they think they need, and (b) what they would conceive to need if the language had the facilities.

There is certainly a diversity of perceived needs, but I think most scientists would love to have something they could use to (1) represent theories, that (2) can automatically yield simulations that can probe the theory, (3) suggest more things to check for "out there", and (4) that allows useful "tinkering", but (5) somehow can help convert the tinkerings into (a) really fast executions that are (b) still in forms that help thinking about the processes. A side note is that lots of programming everywhere is done by making use of already existing packages via what are essentially subroutine calls -- the resulting programs might work, but the forms wind up being intractable to help thinking and further explorations.

However, a little pondering might get us to realize that most IT pros need something like the 5 points above and coupled with an executable requirements language (we can see that the scientists also want to program in terms of "requirements" and that most of a solution for one will help the other, even though some of the needs are very different.

“CS” still has a long way to go here!

What are the difficulties of creating "smart APIs", where two programs can automatically figure out a way to talk to each other?

Great question!

It's worth looking at some similar observations over the years (apologies in advance for the length of this reply).

JCR Licklider -- the great originator (1962) and funder of ARPA-IPTO -- in 1963 called for an "Intergalactic Network", and in one of the earliest memos -- to "the members of the Intergalactic Network" -- observed that the greatest problem that would be faced if we had one, would be "communicating with aliens". He meant not just other people across the globe, but with other programs, and that programs would have trouble communicating with "alien programs" and with "people".

(When asked "Why 'Intergalactic'?" he said, "Engineers always give you the minimum, and I want a world-wide network, so I asked for an 'intergalactic' one"!)

As is often the case, this led to a number of tracks. Most of computing ignored Lick's insights and requests. A determined group within ARPA-IPTO (that included Parc after it was formed) sought to make the "communication of bits" part scale to the planet, even with heterogeneous hardware. This resulted in the ARPAnet, and then the Internet.

Also within this community were ideas about "software computers connected by neutral messaging" that could be a universal scalable way to unify -- and make mobile -- software systems, even with heterogeneous software systems (using similar encapsulation ideas and separating protocols from methods) that could freely be mapped onto the interconnected computers that the Internet would make possible. For example, Smalltalk at Xerox Parc in the 70s experimented with "proxy objects" which were gateways to objects in other systems and computers.

The really good idea of "the Internet as a transparent cache for "intercommunicating object-processes" never gained wide popularity (but see Reed's thesis below).

The "discoverable communications" implications of Lick's "communicating with aliens" metaphors were less thought about and worked on. But there were a few interesting ideas.

In the 60s, Hans Freudenthal -- with Lincos -- independently took a stab at bootstrapping a communications system from scratch with aliens.

In the 70s, the Smalltalk group at Xerox Parc started to come up with ways to "find functionality" without having to know the local labels for it. For example, how can you find the "sine" function if you don't know its name? Or find a "sorting" facility if it isn't called "sort"? -- for example, in French it might be called "tri" or "trier" or "ranger" -- names don't scale well!

Also at Xerox Parc during the same period, it was realized that it was easier to send a program to a server to be "executed in confinement" than to try to get all servers to understand all data formats (this is where Postscript emerged from a graphics making language to a "traveling communications protocol"). The insight is that a rather simple executable language can be made universal, but that there is no end to possible formats of "data". This fits with the overall reasons for "send objects, not data".

Doug Lenat in the 70s at Stanford showed with AM (a program that could discover mathematical theorems) and the later Eurisko (a more general "discovery system") how evidence could be gathered, tests made, matching done, etc. without relying on local labels.

Why AM and Eurisko appear to work

Of great note is the 1978 MIT PhD thesis by Dave Reed -- who is also known as "the 'slash' in TCP/IP" -- which sketches an architecture for an Internet-wide operating system of coordinated objects running in a common pseudo-time.

David Reed Thesis (187 pages pdf)

(These ideas were later validated in the Croquet System ca 2003 by Smith, Raab, Reed, and Kay)

https://www.researchgate.net/publication/4029258_Croquet_-_A_collaboration_system_architecture

The first great practical demonstration of "dynamic mobile processes" across heterogeneous hardware was done in "The LOCUS Operating System" by Gerry Popek and his group at UCLA during the early 80s. This wasn't extended beyond underlying Unix processes, but could have been. The first several chapters of the MIT Press book are excellent in laying out the general issues for machine independent processes.

The LOCUS Distributed System Architecture

David Gelernter in the early 80s called for "coordination meta-languages" from which asynchronous communications brokers could be made. Linda was an early provocative result.

Linda (coordination language) - Wikipedia, the free encyclopedia

There were also some too-narrow gestures at this problem -- for example, CORBA (and there are a number of such too-limited schemes today).

Common Object Request Broker Architecture - Wikipedia, the free encyclopedia

A hugely interesting subgenre of gaming instigated by Mike Genesereth at Stanford also has many clues and provocative possibilities -- the "game" requires the human players to write a software system that at a given point in the future can read from the Internet a just supplied description of a game, synthesize game playing agents from these descriptions, which can then play the game in an environment maintained by Genesereth. If we can see this as "creating automatic programmers who can solve specifications", then it adds a lot to our thinking here.

Gamemaster

There are also some "unfortunate" stabs at universal interfaces that pretty much missed the point of scaling, the Internet, and heterogeneity of software -- for example: CORBA.

A good way to think this through is to start with the benign cases of "friendly aliens" and try to see how this might work.

We could imagine an extension to TCP/IP in the form of a simple but universal and mobile language that could serve as a semantic basis for "matching of meanings" to see if agreements could be found strong enough to allow interchanges.

Given the psychology of most programmers -- who almost never will do something just because it is a good idea -- the lingua franca will have to pay for itself in local ways to have a chance of adoption.

However, we will also need a very comprehensive "expert-and-discovery system" that also has real world knowledge -- such as CYC: a vast combination of a system like Eurisko with extensive real-world knowledge and commonsense relationships -- that can deal with the implications of the expressed semantics.

http://www.cyc.com/

The expert/discovery system can be large, whereas it is likely that the universal description language will be better if much smaller, if only to encourage learnability. (It's an interesting question as to size and features: most programmers are terrible at thinking in terms of and representing semantics -- unit tests are as close as they come.)

This is a large important subject which I've already taken too long for Quora -- but it's worth looking at an interesting, and perhaps bootstrapping, use of these ideas, even before they are perfected for the larger scaling that is the main aim. If we take a look at modules "as they should be" we can readily see that they should be self-contained encapsulated entities for many reasons, yet most will need to make use of external resources themselves. On the other hand, we'd like to debug and vet them to verify that they run as desired, and also that any use of external resources is in accord with their actual needs and specs. (We should be able to see that ordinary type systems are not nearly semantic enough to fill the needs here.)

Instead, we could imagine that each module has a "stub environment" whose members can be a stand-ins for the eventual ones, are able to do enough to allow the module to be debugged, and can also act as semantic guards and matchers to verify that the external resources are as needed. These stubs would look a lot like what Lenat's "AM" system would gather up as it is "finding" "interesting" relationships.

It is very interesting to try designs for these stubs! One of the simplest needs in systems made from any kind of modules -- even just functions -- is "referential transparency" -- and normal typing doesn't do it -- i.e. we want to be able to replace one sine function with another sine function, not just one function outputting floating point numbers with another.

Finally, for now at least, let me pull back from trying to communicate with "aliens out there" to ponder these problems "right here" -- because most of them can be found in corporate software done by the "same" IT organization.

Some years ago, a well known corporate systems supplier asked me to look at their ~ 350 million lines of code (of course, since a 400 page book only has 20,000 lines, I hardly looked at any code at all). But I was able to identify more than 30 quite separate ways of "sending messages" that had developed over the years, and none of the current schemes were coping with the scales that things had grown to -- so both integration and real-time late-binding responses to needs weren't happening. Reverse engineering was extremely difficult and expensive because they really couldn't say what most of the specs and requirements were and what the code actually did in any detail -- it just "did what it did".

And, anecdotally, how many readers remember looking at a routine they wrote several years earlier and wondered "what did I mean by that label?", "how did I actually write this code so it worked?", etc.?

We are aliens!

I'll end by just asking the question "How can we 'ground' the meaning of software we write in other than our own memories and memoirs?" Let's see what Quora comes up with ....

How do I learn object-oriented programming?

Like “Computer Science” — which had a very different and much better meaning when first coined in the 60s (in part, it represented real aspirations towards finding the strongest notions of itself) — “Object-Oriented Programming”, when I coined the term about 50 years ago, also had a different set of meanings and aspirations.

Today in computing, we find ourselves in situations magnified many orders of magnitude by Moore’s Law and the success of the inventions of Personal Computing and the Internet. I think it’s worth trying to think things through carefully rather than (a) trying to deal with the current conceptions of “OOP”, and/or (b) going back rigidly to much of what was so powerful in the computing milieu 45 years ago.

In rethinking things, we find some old friends in ideas — such as protected modules that are “whole computers”, non-command messaging, requirements and constraints, transactions, “before-and-after”, meta-levels, separating meanings from methods, “objects” as “servers”, and so forth. And many of the old dangers: race conditions, indeterminacy, scaling, reformulation, hopeless (and needless) complexities, and many more.

We are still faced with the large problems of design at too many levels, because it is rare that each concern and requirement can be satisfied in complete isolation, and the number and kinds of degrees of freedom that seem to be needed preclude much of classical mathematical treatment in favor of building and debugging.

There are a number of truly important ideas — many from the “deep past” in our field — that need to be comprehensively understood and pondered — both for intrinsic beauty, and to ask what they mean for today.

For example, the first completely startling-system-with-objects that knocked me on the head 50 years ago was Ivan Sutherland’s Sketchpad, already 4 years old. The “entities” in Sketchpad were mostly graphical — they showed up on the display as “things” made of “lines” that were made of “end-points” that had “x-values” and “y-values”, but they were not “data structures” (they were “behavioral” and the Sketchpad programmer could not do anything like an “assignment statement”).

Instead, Sketchpad was “programmed” by a combination of hand constructed “objects” whose behaviors were “impressed” on the objects in terms of “constraints” (which were the dynamic requirements for each object). The Sketchpad system itself dynamically “solved” the intertwined requirements — and this let the “programmer” think in as linear terms as possible to allow most brain-cells to be used for the difficult problems of design and purpose.

The field back then — and this author — were not up to really carry this model forward — instead, we found ways to approximate some of the ideas, but at real cost to the integrity of aim that Sketchpad brought. There were a few important exceptions over the years.

But today, it is possible to really address these important ideas about “designing and programming in requirements” with complete separations of “tuning” and “optimizing”.

A lot of the best systems in the future will be a lot more like Sketchpad in approach than most systems today. We need to work to make this happen!

There are a number of other really important ideas from the early 60s that have missed becoming part of our basic tools and thoughts today (partly from the faddism that has always been rife in computing, partly because our not-quite-a-field cares no more about history than any manifestation of pop-culture, etc.).

A good example is how John McCarthy in the early 60s was able to both advance states in time but without race conditions or violating “logical and functional relationships”. He called the mechanism “fluents”. Today, one of the terms used for this is “computing in pseudotime”. The idea should be familiar: instead of destructively changing things, retain a history of the changes going forward, each new event representing an increasing point in pseudotime, which becomes an obligatory parameter on every object — the aim is consistency of relationship for each pseudotime. With a few more niceties we wind up with a universal use of “atomic transactions”, “versions”, etc.

It should be clear that there is no conflict at all between the idea of protected modules, non-command messages, and “functional relationships”.

There are many more important parts to think about and rethink — but this is already too detailed.

A good heuristic for my own thinking about our new not-quite-a-field is to not just “think systems” (avoiding lower level mechanisms), but to “think Biology”. The latter is tricky because not all the systems principles that can be and are used by Biology are within the current scales of computing. But, if you think about “cells as objects” then many important principles quickly come to mind. (And if we look around for the system that is most like this today, we find Erlang and its derivatives …)

My conclusion here is that in in the early stages of any field, it is not a good idea to get rigid and dogmatic, even religious, about “principles that are not strong enough to be principles”.

The Turing Award winner Tony Hoare had a great observation on us in general: “Debugging is harder than programming, so don’t use all your cleverness in writing the program” (this goes for design too!)

What are 5 most interesting and well-paid jobs in CS?

“Real Science” like any great art form is a “calling” not a “job”.

I realize that the use of “Science” in “Computer Science” has now lost its original meaning and intent, and is now more like the usage in “Library Science”. However, as an old fogey who learned and loved the original meaning, I will stand pat on it.

In a “calling” the questions are first and foremost how to grew to be effective at your art. Beyond this, most who are called are greatly aided by good funders, and usually impeded by poor ones. Much of the effectiveness of the ARPA-IPTO community in the 60s and its last add-on of Xerox PARC in the 70s was due to how great funders were able to support “people of possibilities” to realize their potential in the wonderful vision that was the foundation for this research.

There were people who were already “called” to the the Romance of Computing before the ARPA funding was started in the early 60s. These formed the early community, and they attracted others, many of them younger with even fresher ideas.

Now, what was your question again?

How can I learn to create computer-generated music?

Which part of your question is more important: “music” or “computer-generated”?

“The music is not in the piano”! So if “music” is the more important part — I think it should be — then developing the internal musical tendencies given to us by our genetics is the most important thing: a lot more singing and dancing as a start!

“The music is not the notes”! whether on the page or played by anything … similarly: “The speech is not the phonemes”!

Music is a way to express both emotions and ideas, especially those that are both important and don’t fit well into speech.

Both speech and music lend themselves to technical analysis and skill at every level — But “Technique should be the Servant of Art, not the Master”!

To developed musicians most “computer-generated ‘music’ “ doesn’t sound remotely musical from many perspectives: phrasing and prosody, rhythmic, taste, etc. CGM is a kind of “designer jeans” (i.e. expensive pretentious dungarees). It’s a caricature of music.

However, in the hands of developed sensitive musical sensibilities, the computer becomes a wonderful new way to create musical possibilities and expression.

In our age of “fantasy sports” and other “fantasy pursuits” there are many kinds of “fantasy music”. It’s worth putting in the effort to start getting into the real deals.

What is BIOS? What are its characteristics and functions?

With all due respect, this is not a “theoretical computer science question”, or even a “computer science” question (“real computer science” (RCS) is distinct in many ways in what it is trying to do from practical engineering). I use RCS here because this distinction has been almost lost for most computerists.

A good enough answer to this question can be found in the Wikipedia article on BIOS (as with many answers to many questions asked on Quora).

How does one mathematize code and write bit-identical code for all platforms?

I don’t think I have used the term “mathematized code”. I have suggested (and my research community has used) the tactic of coming up with “mathematics” for important ideas, and then trying to turn the math into a runnable computer language (also a kind of math but often less “relational”). And to try to separate the -meanings- from the optimizations, etc. For example, that was one of the directions taken in the early 60s when there was a desire for a meta-language for grammars that was formal enough to be reasoned about, expressive enough to allow grammars to be readily understandable and communicated, and also to be “runnable” to make parsers. The Alex Warth “Meta” example in the talk (from 2006 or 2007) is a particularly good one: the grammars are clear, and so are the translation relations.

The “bit-identical” computation ideas came from a desire in the 60s to ensure semantics over many platforms (and this is oddly still not a top priority in the computing world). In the ARPA-Parc community — which was working on personal computing pervasively networked over the world — thus many hardware platforms and likely many OSs — this was a critical need.

The byte code virtual machine idea traces back to an actual piece of hardware — the Burroughs B5000, mostly designed and invented by Bob Barton — which had the aim of being able to directly manifest a reasonable hardware platform for “higher-level languages” (such as Algol). The result was quite wonderful, and at many more levels than just byte-codes (all computer people should know about and understand this amazing design from the early 60s).

Various people realized that the B5000 architecture would make a great and highly portable virtual machine — a notable example was Wirth’s and Weber’s Euler language (CACM Jan-Feb 1966), whose byte-codes were later executed via microcode by Weber. These were called P-codes and were used in a variety of systems, including Pascal.

Once networking is added, with the possibility of a particular system running simultaneously on different hardware and even migrating computations, the idea of “bit-identical” arose. This ups the ante from “very similar computation” to the need for “exact computation” (and we have to figure out what we mean by this).

There is now more work to be done, and the tendency of computer people to borrow code to save work gets in the way. Just to mention an extreme case, when the Croquet system was being built about 15 years ago, the “bit-identical” Squeak Smalltalk was used as the foundational environment. Croquet is based on Dave Reed’s 1978 MIT PhD thesis about replicating computations in pseudo-time over the whole Internet, and so the logically identical objects on each machine have to always come up with the same results going forward in pseudo-time. Eventually it was discovered that, while Squeak Smalltalk did indeed ensure that floating point arithmetic, etc., was the same regardless of platform, that it did not ensure that transcendental functions were bit-identical. (Things worked better after this was fixed!)

Note that “byte-codes” are not the issue here, but “preserving meanings” is. So, if a JIT translator is used, however it is done, it still needs to “preserve semantics” — everyone agrees on this, just to different extents on what “preserving meanings” actually means!

It’s worth thinking philosophically about these issues — and to also ponder the pragmatic implications of (a) world-wide computing (b) dynamically migrating computing (see Gerry Popeks’ LOCUS system from the 80s), and (c) perfectly replicatable computing. Note, that as a system is scaled, it becomes more and more worthwhile to architect ways to get desired results in the face of more ways to make errors. It’s likely that heuristics that will converge computations to obey constraints will be more useful in the long run than to try to make “perfect clockwork”. This is an “Engineering + Shannon” solution to something that can be difficult when trying for exactitude.

How are these beautiful software GUIs developed?

There is “what?” and there is “how?”. We can guess that “what?”is difficult in numerous ways, because there are many poor UI designs to be seen these days, even with components supplied by OSs and libraries. Quite a bit of good UI design requires deep understanding of how we humans “work in the world”, and it is certainly the case that most computerists today have missed this learning.

There are many more roads that could be taken besides the combination of ideas in the Parc GUI (overlapping windows(views), icons, pointing, modeless, 2+D, models-views-controllers, etc.). Here is an essay I was asked to write in 1989 about this approach http://www.vpri.org/pdf/hc_user_interface.pdf

The “how?” part is relatively easy if you don’t have to optimize: e.g. if you can refresh the “virtual display screen” after each graphics change, if you can afford to march through all of the boundaries of the regions that are supposed to be sensitive to the pointing action, and so forth.

It will really be a worthwhile exercise for you to do a simple from scratch system that can be manipulated by an end-user.

You will have arrays of 32 bit numbers that will represent pictures with a pixel being 3 12-bit fields — for red, green, blue — of each number. Write a routine that will copy one of these — it will have associated with it the x and y and width etc of the picture — into a very large array that will be your simulated display (you will hand this off to the OS for display after each major action).

Associate a “depth number” with each picture and put pointers to the pictures into an array that will hold the pointers in deepest order first. Write a routine to clear the display array and copy each picture into it. This will manifest as the familiar “2 1/2 D” graphics most often used.

To deal with pointing actions, write a routine that marches through the picture rectangles (you have the x, y, width and height as parameters for each picture) to see which rectangles the pointer point is in (note that you generally want to give precedence to the topmost one).

You might try to set up two UI actions:

-clicking-, which will bring a picture to the top (meaning that you’ve changed its depth to “highest” which means that the redisplay will display it last, and this will appear to have moved it to the top),

and

-dragging-, where you will have to remember where the pointer was so you can appropriately change the x and y parameters of the picture you are pointing at so the redisplay will appear to move the picture to its new x and y location.

This will motivate writing routines for figuring out pointing actions. For example, a “click” will start with a “mouseDown” and be followed in some max time with a “mouseUp” action. If the max time is exceeded then this would be called a “mouseStillDown” and this might be associated with a -drag- action (as given above).

And this will motivate writing an event handler of some kind. A simple one is a kind of conditional scheduler consisting of a loop that has tests for various pointing actions with the “then” part calling the appropriate action routine.

And this will motivate getting around many of the things that are too primitive in C to be reasonably used directly.

For example, a good ploy in lower level programming of any kind is to set up your own multitasking scheduler. A very simple way to do this is “loopless” programming, where you don’t trap control into long loops, but instead thread the control through each iteration in all the “virtual loops” that need to be happening in each simulated process.

All that I’ve mentioned can be done in a few hundred lines of C at the most. What’s intricate about it is the various higher-level architectures put on top of C that you need to think through.

Speaking of that, one of the biggest motivations for “real object oriented programming” was computer graphics, starting with Ivan Sutherland’s Sketchpad system in 1962. So many things in interactive graphical systems are much more “object-like” than simple data-structure-like.

A final thing you might try as part of your test architecture is to ponder that “things made from things that are made from things” appear overlapped on a 2D screen. So a pointing action is really a ray going down through all of the layers. This means that it is not clear which of the objects’ handlers should be called.

Note, that it is often the case that it is the “bigger” “outermost” picture that will have the “inner” ones as parts. It’s often the case that the “outer” object will need to have the first cut at figuring out what to do — on the other hand, many operations will want to be done with the “innermost” object. This suggests that some form of associations will need to be set up to deal with “ownership” (a two way relationship) and with bundling all the hits from a pointing action into a structure that can help sort out who will do what.

All of the above starts to motivate moves away from simple data-structures to more comprehensive ways to deal with relationships.

And this should motivate a move away from C and C-like languages, which are almost as far away from these ideas as machine code.

But, for learning, it can be a good thing to deal directly with some of these mechanisms.

Meanwhile, the much larger part of all of this is to learn “the art of understanding humans” and this leads to “the art of user interface design”.

What is a good advanced OOP course that is equivalent to CS108 at Stanford?

There is much too much unsubstantiated criticism on the web. I think real criticism should be just the opposite: highly supported with lots of counter examples, etc.

So, having looked at the webpage for CS 108 at Stanford — Object Oriented Programming — I’ll beg off from attempting a critique for lack of time and energy, and instead will offer an emotional reaction: I didn’t feel any warmth for the approach, especially with regard to any reasonable meaning for “Science” in “Computer Science”. Nor did I feel warmth if this were intended to be a course at any level in “Software Engineering”.

What to do? I’d suggest looking at some of the most interesting examples over the years: from Ivan Sutherland’s Sketchpad — the main inspiration for my interest in this — to what was accomplished at Xerox Parc — to what is being done today in ways that are most like some of the original ideas in OOP as I envisioned it e.g. the massively parallel organizations and philosophy of approach of Erlang.

Along similar lines, I would avoid trying to get a picture of “real OOP” from looking at stuff done in C++, Java, etc. (it’s not that you can’t set them up for “real OOP” but that the standard practices are not in “real OOP” style, but essentially in a parallel (I think massively weaker) style of Abstract Data Structures that I think has always been a real dead-end and distraction from what is needed).

All that said, we need something much stronger today — qualitatively stronger — than even the best ideas of OOP from the past. A recent answer of mine on Quora touches on this.

What is [Alan Kay](https://www.quora.com/topic/Alan-Kay)'s definition of Object Oriented?

The other answers are well worth reading. An interesting question arises about terminology: to what extent is it reasonable to try to retain original definitions versus having a term get “softer” in meaning as time passes and new and additional conceptions are formed? (The latter happens in part because of the ways we spontaneously evolve our languages, so trying to legislate against it doesn’t work.)

However, I think “colonizing” a term to get reflected status is not a good practice, because it really weakens the central ideas (for example, C++ is called an “object-oriented language” — and most people argue “it is!” — but it is much too far from the ways I was thinking to be included in any definition I would come up with).

Part of the problem here is that I made a mistake with how the term was coined — I should have picked something else — in hindsight: “server-oriented programming”?

In any case, the “server” metaphor — mentioned by Eric des Courtis below — is good enough here (since I’ve written about and answered questions about “objects” in the past, and especially in the “The Early History of Smalltalk” written for the ACM ca 1993.).

There is a bit of a red herring here because the power of a comprehensive universal building block can also be its downfall. For example, a “server” could choose to allow its encapsulation to be violated — e.g. by making its services to closely resemble data structures acted on by procedures. Here, in my opinion, we would be simulating quite the wrong kinds of things, and devolving back into weak and fragile programming styles. (That is my view of what has mostly happened with “objects” — “real objects” never showed up because most people wanted to retain their data oriented style, etc.)

We could argue that the definition was incomplete — even: poor. It allowed too much discretion on the parts of programmers (this was partly because we used it for our own purposes at Parc and thought — not terribly accurately — that -we- had sufficient discretion to use it wisely (certainly not 100% of the time!)

And … this November will be the 50th anniversary of my “recognition” of the powers of the simple idea of making everything from “encapsulated servers exchanging non-command messages”.

Even though — in my opinion again — the simple idea of making computation systems “be like computers on the Internet” still isn’t generally recognized, much more is needed in programming and systems building than “a great recognition” from 50 years ago which had enormous relative power for about two decades.

For example — today and tomorrow — we should be programming in terms of “requirements and goals” that can manifest a workable system (possibly needing a super-computer).

We should be able to optimize a system like this without touching the requirements and goals part, etc. The feeling of such programming should be like the CAD-SIM-FAB cycles in more developed parts of engineering. In other words, we want to devote most of our attention into “the whats” rather than “the hows”, use most of our energy for design, and we’d like to “ship the design!” (that would be a good slogan for the next few years).

Just as the great language Lisp was first for programming, but then became “a very high level machine code” for higher level ideas, we should see that what was powerful about direct programming with the kinds of object systems we made at Parc and subsequently so many years ago, should now be retained for structural integrity and other pragmatic reasons, but that the code should now be automatically written from much higher level sources. This doesn’t mean that “objects are now hidden”, but that they should be part of the “modeling and designing of ideas and processes” that is the center of what programming needs to be.

How difficult is Stanford's CS231N?

This is not a “Computer Science question”, it is a question about academics at a particular institution.

At OOPSLA 1997 Alan Kay gave a talk titled "The computer revolution hasn't happened yet". What parts have materialized thus far and if not why not?

The excellent answers by Mark Miller and by Tyson Edwards both get at some of the problems with “a real computer revolution”.

One of the analogies, and questions, we used 50 years ago to think about this was to the printing press, and “when did the real printing revolution actually happen?” Back then, I started to argue that the real printing revolution started in the 17th century, and what happened in the 150–200 years in between was what McLuhan had identified as “the new media starting out by absorbing the forms and content of the old media”. In other words, the printing revolution was really about the invention of science and modern governance, not about getting the Bible to more people more cheaply. People thought it was about making old writing more “convenient”, but it actually toppled whole social and belief systems.

You can see that the important part of this phrase involves trying to figure out what a real computer revolution might be all about — and that — if it is a -real- revolution, that it won’t be about what first seems to be in view.

Here is an essay I wrote about 10 years ago on this subject — it has some of the arguments, and some examples of what an actually “new literacy” might be like — but don’t let it get in the way of your own thinking about this important question!

http://www.vpri.org/pdf/m2007007a_revolution.pdf

Is an iPad that boots into Squeak (or Croquet) effectively Alan Kay's dynabook?

The short answer is “No”, despite the iPad having vastly more computing, storage, display, network, and battery resources than the “Dynabook model” thought up in 1968 and sketched in several papers in 1972 and afterwards. http://mprove.de/diplom/gui/Kay72a.pdf

A good historical “first pass” essay is a 1975 conference paper I wrote that showed many examples of the first 3 years of work on this project, and the goals that are expressed are in a more useful and less diffuse form than the above “vision paper”: http://mprove.de/diplom/gui/Kay75.pdf

This was written on the “interim Dynabook” personal computer at Parc that is described in the paper, and was printed on the first laser printer (another one of the early fruits of Parc at that time).

“No”, because the Dynabook was primarily a service-idea, and the iPad does not deliver enough of the services envisioned 48 years ago to qualify. An answer longer than is suitable for Quora — but better than I can do here — can be found in an Afterword I wrote a few years ago for a book on children’s programming: http://www.vpri.org/pdf/hc_what_Is_a_dynabook.pdf

I’ve got the latest iPad Pro here to try things. I can draw, so I’m naturally interested in how well one can draw on it (partly because this has something to do with children and others learning to draw on it). In the Notes app, I think this is “good enough”. In some of the other apps things are too slow, but let’s not blame the iPad.

However, where does one put the stylus? Steve sent me one of the pre-release versions of the iPad for comment (and you could get a capacitive stylus that would work with it, sort of). But where would you put it? I asked him that, and he said “no one will use a pencil”. I said “That’s good for 2 year olds and 92 year olds, but everyone else uses and learns to use tools that are extensions of our gestures and hands”. (…)

The Dynabook is exhibited with a keyboard because even then it was known that a keyboard for rapid typing would be needed (this because of experience with the first good tablet based system at RAND, the Engelbart system with chord keyboard, and some experiments with touch sensitive keyboards on displays (the possibility is mentioned in the first Dynabook essay)). We liked the idea of a displayable keyboard but worried that it wouldn’t be good enough to really use for serious typing …

The iPad Pro does have a keyboard accessory but the way it works (not well) with the iPad and the stylus (not well) indicates perhaps a desire to just compare feature points with the MS Surface. This somehow reminds me of the difficulty of lifting out a MacBook Pro from a book bag — it’s too slippery and it is too heavy not to have something to help! It seems to be been only designed for looks …

(On the other hand, note how long it took Apple to even create a package for the iPad that combined a stylus, touch sensitive display and a keyboard (which were the original specs for the Dynabook) … and perhaps the Surface had more to do with this than actual realizations at Apple?)

It would be disingenuous to not point out a design flaw in the 1968 Dynabook model here (just because we didn’t think about it, and it wasn’t thought up and done until much later). And that is it would be great to have a touchpad below the keyboard, despite having a touch sensitive display, and a stylus. But, after this invention in the 80s and many years of experience with it, we can note that the iPad Pro keyboard is not equipped with this so useful affordance yikes!

But the larger issues have to do with intended uses. The current question assumes that Apple will allow dynamic languages to be run, and (I suppose) for projects and interactions that involve code to be shared. Let us suppose that they make a better integration of the keyboard and fix the stylus and inking speed problems. Now we have to look at what owners of Dynabooks — especially children — should be able to do with them. Besides referring to the above “What is a Dynabook?” essay, I’ll try to make a few more comments here.

And these comments are most critical of the general attempts over the years to come up with a really good computer system for “children of all ages” — and especially my own attempts with the enormous help of many talented colleagues. We didn’t get there (and part of this was our own fault).

Cutting to the chase, the most “Dynabook-like” system for children was “Etoys”, which was done in the late 90s and released in the early 2000s. This incorporated many really good ideas from many sources — including the LOGO ideas of Papert, the Smalltalk ideas from Xerox Parc, the Hypercard ideas from Apple, some of the Engelbart collaboration ideas, etc. — into a system for 5th graders that allowed a number of breakthroughs in helping children to think more powerfully.

http://www.vpri.org/pdf/rn2005001_learning.pdf

http://www.vpri.org/pdf/rn2005002_authoring.pdf

A side note here is that Etoys absolutely required “ideas from a community” (just as did personal computing and networking required the whole ARPA community’s contributions).

For example, Hypercard was more naive than we were thinking — but it was a perfect hit in the user space we cared about — we needed to see Hypercard to help us understand what we were fumbling with.

Similarly, we needed to see Mitchel Resnick’s “Starlogo” even though we’d been beating the “simulation” drum for a long time. Starlogo was a really great example of an important part of simulation for children that helped every other more general idea we’d been trying to have.

And I could go on and on here about many other truly important perspectives on children, powerful ideas, and computing that came from others.

And, if we stay with the originally stated ideals of the Dynabook — and of the ARPA community of the 60s — then there were still quite a few important things missing in Etoys. One of the most important was age ranges — many ideas and facilities for younger and older children were left out of Etoys — and along with them scales of aspiration, really good integration with the Internet, and: the amount and quality of the packaging and support that is needed.

Some of this was supplied by a descendant of Etoys — the very popular Scratch system (which was done by some of the same researchers). But Scratch unfortunately removed many important features of Etoys — especially in the media and system simulation areas — so it can’t be put forward as a real improvement.

I think the most important point — and a way of answering this important question in a useful way — is to consider “personal dynamic media” as a kind of “embodied curriculum for civilization” and to ask and answer questions along these lines. What should we be trying to have people learn — especially children — and what are the best ways of helping them?

To me, the most interesting part of these questions is that I’m confident that both can be answered not just in the context of the original Dynabook idea, but in the context of today. There’s quite a bit of work needed to catch up to 1968 — but this is doable now — and there’s quite a bit more work needed to catch up to our world in 2016 — and this requires another large push like the one my research community made in the 60s and 70s to invent personal computing and world-wide networking.

But we can do it — and I think we must!

Is Alan Kay correct that building software is still like the design and construction of ancient structures like the Pyramids, in the era before architecture?

The essay quoted was written in 2001. And, 15 years later, it’s still a good question, but I think harder to answer — and especially for me: I haven’t been tracking software development generally for a number of years (and there’s been a lot more of it).

In any case, my essay wasn’t really about Egyptian pyramids, but about whether there could be a real “computer science” that could lead to a real “scientific software engineering” in similar ways to the revolutions in designing, testing, and fabricating physical, electrical, and biological structures over the last 70 years or so.

If I were to write the essay again, I’d try to be more clear about the above paragraph, and would focus less on “late-binding” (most definitely needed), and more on “debuggable designing” (I think more the key to the needed processes).

I think there are really good analogies to the current CAD->SIM->FAB revolutions we’ve seen in other areas of engineering and manufacturing. We need to design most clearly in terms of meaning, we want the system to simulate our designs to help us see whether we’ve said what we mean, and, though copying a software system’s bits is easy and cheap, the main kinds of optimizations and fits to existing systems is not (I’ll take that part of the process as what we mean by “FAB”).

Even before we get to thinking about what the front end for this new take on software engineering might be like, we can see that in most existing programming languages, programming practices, and programs, that meaning has not been separated out from optimization — they are almost always commingled, even in attempts at very high level languages. We could imagine a language in which optimizations were “hints from the side”, and the programs themselves are just about meanings, but I don’t know of one currently.

A more often used practice is the opposite: to use rather ad hoc methods in writing the main program and then try to supply “meaning hints” from the side (in the form of assertions, unit tests, etc.). These don’t help the system make a better program, but just trap more errors.

We could imagine a future system in which most of constraints expressed by the unit tests, etc., are the main program, and the pragmatics and optimizations are automatically created to deal with constraints on resources.

Most of these comments apply to the impetus for “higher level languages” in the late 50s through the 60s and 70s. Many of the inventors of high level languages back then wanted “more meaning and less hacking”.

From my perspective, this impetus has not been carried through strongly enough since the commercialization of personal computing and networking that started in the 80s. Most of the languages used today do not go beyond the languages of the past, and it is also quite startling to see so many programmers of today not really using “CAD” for programming, but instead there is much clinging to simulated Vi terminals, which are simulated punched cards, etc. (It is quite startling to see the CAD tools used in the other “real” engineering professions!)

Just one more comment here … If there was a real “computer science” we could imagine several ways to approach this problem. For example, it is highly likely that programming and debugging in terms of meanings, especially for systems that have real-time constraints, is going to require a lot of computing power to do, and especially to initially develop. This would suggest that such computer scientists would be developing their ideas on super-computers of one sort or another (at least what used to be called “work-stations”, if not full-fledged larger more powerful systems). In other words, to do CAD->SIM in ways that might not directly be connected to deployment, but to use the FAB part to do the work needed to deploy.

Instead, what we see almost everywhere today are programmers programming on the target systems, and this has most things wrong with it — not the least that it almost forces premature optimization, which then puts a chill on further designing and reformulation of the design.

I will submit here what I think is a much better process, that was used in the ARPA/PARC community in the 60s and 70s: do development on “computers of the future” to find out what is actually needed, and then find ways to deploy (one of these is that it takes a while to develop software, so you should at least use computing power of several years in the future). Another is to also develop hardware in various ways to help both development and eventual pragmatic deployment.

It’s hard to take our not-quite-a-field seriously when most of its practitioners don’t seem to take it seriously.

Where can I find/listen to music recordings featuring Alan Kay the computer scientist?

I played jazz guitar around NY and Denver from about 1957 to 1966, and wound up quitting the next year because I couldn’t do the practicing I needed and do grad school at the University of Utah at the same time. (I was the kind of player that needed to practice a lot to keep up with players of more intrinsic ability.) I don’t know of any extant recordings from then (although one of the groups I played in did have some recording sessions in the late 50s). Many years later I started classical pipe organ (but not as a professional) for my own pleasure.

What was the first novel for Daniel Defoe?

I didn’t know much about Daniel Defoe — except Robinson Crusoe, Moll Flanders, etc., and that he was very early in the English Novel and a prolific writer.

But the Wikipedia article was very interesting — did you consult it? — it lists Robinson Crusoe as his first novel — and the sketch of his life got me to start looking at more primary materials (because I’ve found Wikipedia articles on things I have detailed knowledge about to be inaccurate).

This prompted me to go to Amazon and order “Daniel Defoe: The Life and Strange, Surprising Adventures” by Richard West for 36 cents plus postage (seems like a good deal for anyone interested in ideas).

How does one explain the phenomenon of "being on autopilot"?

Almost all of our behaviors — including much of what we like to call “thinking” — is done “on autopilot”.

Perhaps too simple — but I think still useful — is what Daniel Kahneman in his book “Thinking Fast and Slow” calls “System 1” (all the mechanisms we have to deal with things quickly in real-time) and “System 2” (all the mechanisms we have to ponder slowly). He calls these “explanatory fictions” (we can think of them as “useful fictions”).

System 1 is quick to react and difficult to teach (and unteach). System 2 is slow to react but can learn more readily. Most things we start using System 2 and eventually System 1 is able to take over more of the routine tasks.

A good example is that I can tell you that I’m going to slam a door and then proceed to do so. Despite having warned you, you will instantly jump and start secreting adrenaline, dopamine, seratonin, etc. all of which help in “fight or flight” reactions.

Similarly, no one goes on a roller coaster expecting to die, but your System 1 doesn’t know about this benign expectation (it really can’t communicate much with System 2), so it reacts violently. The pleasure at the end comes from the high from the “free dope” you’ve just given yourself.

If you remember learning to drive, there was a lot of chaos, tunnel vision, difficulty of hearing your instructor or parent, what gear?, what stop sign, what child in the road? This is the general state when confronted with real-time demands for most behaviors without a trained System 1 to react quickly more or less reasonably, and only the slow System 2 tries to deal with the demands. (This is what happens with many learning situations for children in school.)

Weeks later you are driving along, having a conversation with the person next to you, seeing stop signs, children playing etc. And you are generally mostly aware of the conversation and only aware of the situation outside unless something needs to be dealt with.

The tradeoffs have to do with our limited capacities. Our mind is set up to routinize as much as possible. And most of the mechanisms in our brain are aimed at routinization. This makes us more efficient at the cost of flexibility. And at the cost of not being very aware of what and why we are doing most things.

How can you add more value as a user interface or user experience designer in the team?

The reason even the few good UI designs are “so-so” is that we still don’t understand enough about human beings to do great designs, especially ones that — like musical instruments — can elevate human expression and thought as they are learned and used.

The reason most UI designs are pretty terrible is that most UI designers actually know very little about humans and most aspects of design. Add to this that so many “computer people” are indifferent to UI (some are actively against putting time and effort into it), and you have a recipe for further bad designs.

The slippage in standards has been great (e.g. most apps on smartphones and tablets aren’t set up for UNDO; they aren’t set up to help the users learn the gestures that would make them powerful users, etc.) And these comments are just on UI ideas from 50 years ago.

So: put a lot of effort into learning what is known about human psychology and anthropology (including the biological aspects). Learn about human learning of various kinds. Look at great designers and UI thinkers within and outside of computing. And try to teach your colleagues on your team about UI and why it is really critical.

Are there any books (apart from 'Dealers of Lightning' and 'Fumbling the Future') about PARC's rise and fall from grace?

“Fumbling The Future” is a poor book to read. Michael Hiltzik, who wrote “Dealers of Lightning”, did a fair amount of research, but the book is marred by “the heroes’ journey” tropes and confusing timelines.

“The Dream Machine” by Mitchell Waldrop, on the other hand, is very good, in part because it is the story of the larger enterprise started in the early 60s by JCR Licklider for ARPA, and PARC was an extension of this mostly stocked with former ARPA researchers. In this history you get the whole sweep of the very romantic initiative. There are a few quite minor errors, so this rates high in accuracy also.

You might also be interested in a tribute I wrote to the larger community: The Power of the Context, which has an extensive bibliography that includes a number of the personal histories of PARC researchers.

Has anyone criticized Alan Kay?

As Mark Miller mentioned, Dijkstra once said that "Object-Oriented Programming was such a bad idea that it could only have come from California"! However, he and I were friendly, so I took that as a backhanded compliment. My research group at Xerox Parc was known as "The Lunatic Fringe", usually (I thought) affectionately. I don't count trolls on reddit and slashdot and elsewhere.

There are two kinds of criticism that one can't do much about. One is represented by difficulty in getting good funding (meaning funding that is like the classic ARPA-IPTO funding of the 60s, and its continuation in many ways at Xerox Parc). Those changes represented a change in how people saw how societal and business investment should be done, even as the fruits of the very different ARPA way to fund research were generating an entirely new industry and literally trillions of dollars of new wealth.

The second form of criticism is also oblique: failure of most people to put in the effort to more fully understand the ideas that were really good. Engelbart is the prime example here -- the lack of interest in CS in his best ideas has been distressing for years -- but this has been the case for most of the best ideas that came from Parc as well.

Certainly, the greatest critic I'm aware of is myself. My temperament is basically that of a romantic idealist, and a "starter" rather than a finisher. I've been on myself since a kid for everything in myself that doesn't measure up to idealizations, and for not doing a good enough job getting stuff done. And still am.

I was very lucky 50 years ago to have accidentally wandered into one of the ARPA projects (Utah) where they were engaged in the most romantic technological ideas I had seen: personal computing and world-wide networking. They also had a zeitgeist about how to go about doing things (that I later learned came from how the precursors of these efforts were learned in WWII -- especially the radar effort at Building 20 at MIT). I started to acclimate into this rhythm, and this required a combination of compartmentalizing the ideals into a room I could visit but was no longer my whole world, learning to work better, some therapy to deal with the positive and negative consequences of forming a group (which among other things would contain "finishers"), etc.

I just saw a wonderful thing. Johannes Haushofer of Princeton, has just posted a CV of his failures: https://www.princeton.edu/%7Ejoha/Johannes_Haushofer_CV_of_Failures.pdf This appeals to me very much, and I will try to follow suit this year.

Finally, it's worth looking at "crazy". As Korzybski and others before him have pointed out, we are doomed by nature to deal with the world via mere beliefs that just can't be in accord with "what's out there?". We are all delusional, and by rights should consider ourselves and others as "crazy" -- that's a good place to start in trying to better "see" the world (and it's where science starts).

However, humans generally consider that which is plentiful in their environment as "normal", and that "crazy" is a set of beliefs that is not plentiful. This obtains to some extent in science as well, because scientists are human also, and it's hard not to drift into believing even when one is trying to be scientific.

Some percentage of people find "normal" less binding, and can generate "crazy" ideas more easily. Most ideas are mediocre down to bad (it's quite difficult to have a good idea, just because they are so rare and unusual). However, if I can use a baseball analogy, there are two kinds of "not as hoped". A good batter will still perhaps not get a hit more than 70% of the time, and this "not get a hit" is not considered an error, but intrinsic overhead when trying to do something difficult. On the other hand, not catching a fly ball is considered an error, because it is supposed to be within the range of learned technical skill. If we translate this into the world of technology, having really good ideas is like trying to hit the baseball, we are lucky if a small percentage are really good, but it is an error if we can't make a good idea into an artifact that allows testing out the idea (we should have learned the technique to do that).

A good heuristic here is from Yogi Berra "You can't think and hit!". Meaning, you have to let the ideas come without a lot of prejudgment (swing the bat!) Then think a bit. This is kind of "crazy".

The fun thing about our century is that there is enough known about the physical and mental worlds to allow some vetting of crazy ideas. And lo and behold, some of these look as though they could actually be pulled off, perhaps with a decade or more of effort. After this effort is done, there is even more opposition to the validated idea (it is an attack on many people's sense of identity and beliefs). 30 years or more later it has become a new normal -- and a new barrier for making progress.

From my standpoint, the "craziest" idea that I've had -- not an original one at all -- is that education and its processes can be improved enough and deployed universally enough to counter our dangerous genetically endowed tendencies, which might have been useful for survival once, but which now threaten both our species and our planet. There is no intrinsic reason for supposing that the little bit of progress we've made in these directions via learning and changing our cultures could be carried far enough to not just stave off disaster, but to create a new conception of humanity.

Still, I'm most definitely crazy enough to keep working on this.

As a computer science major, I want to develop my mathematical problem solving skills. So, what books are good for developing problem solving skills?

I like both the previous answers, especially the "until your fingers bleed" one! I should add the classic by Polya "How To Solve It" to the reading list. For a more contemporary look I suggest tuning into the Abel, Fields and Turing Laureates Meet the Next Generation › Heidelberg Laureate Forum in September where Fields, Abel and Turing Award winners winners gather each year to meet with students and talk about issues: I noticed this year that one of the sessions will be on the expanded range of "what constitutes a proof" these days.

Back to "bleeding fingers" and a musical analogy. It helps a bit to read books, etc., about the musical instrument and music you are trying to learn to play, but the main problems have to do with teaching your interior "System 1" and "System 2" that the instrument is an extension of your body, mind, will and art.

A big part of this is "training intuition", particularly for understanding and for improvisation. When you start off in any of these areas it is like what learning to drive a car was like -- things are happening too fast, tunnel vision, you don't know what is going on, etc., -- because you need "System 1" to have set up hundreds of things that are automatically paid attention to, and this hasn't happened yet.

Throw in "solving" for "composition", "improvisation", "proving", "inventing", etc. and it's initially like being in an ATV in the dark driving across an unknown landscape with a 1 watt bulb. You can't "see" where it would be good to go! Many "bleeding fingers" later your headlights illuminate more and more and allow you to pick fruitful paths through the obstacles and distractions.

When I was a math major I asked one of the grad students how to do it, and he said "Well, you have to guess the answer, and that will tell you how to do the math"! Some "bleeding fingers" later, I was starting to make better guesses.

(A good heuristic here is to find a way to avoid being distracted by your guesses, even when they get better -- System 1 is quite a terrible thinker, and most ideas are mediocre down to bad!) I like to write down guesses in a notebook and then not look at them. This allows much more of the terrain to be seen.

The other heuristic is a more drastic one, and more useful outside school: "the problem is not the problem" -- meaning: it is worth finding the real problem before trying to solve the apparent problem (which is usually tied to a weak and obsolete context). I used to get in trouble in school for ignoring the problems I was asked to solve and going after ones that seemed "better" (but this propensity helped greatly after I got out of undergraduate levels and started working on real problems).

And, if you are a computer science major, then it is important for you to understand that "mathematics" is a plural for a reason (namely, so we can invent a math when we need to make progress). Computing constitutes a new area of relationships which allows mathematical possibilities, but some of the important properties require new ways to think about them (some of how computing has gone astray has come from trying to apply too much old-style thinking to what is new and different).

Finally, when a mathematician comes in strong contact with real engineering there is a moment of shock when both the nature of proof and much of the method can be seen as "good engineering" (in this case of relationships -- or as von Neumann liked to say "mathematics is relationships about relationships"). Engineers are often a bit cavalier about math, but they have the same shock when they really get into it, and often find that some of the heuristics that work for math also apply in physical world thinking.

The current day Venn diagram of these processes has (in historical order) Tinkering, Engineering, Mathematics, and Science -- most of the expert practitioners find the sweet spot in the intersection and use aspects of all to make progress. I advise "bleeding your fingers" to become an expert in all four.

[1] Thinking, Fast and Slow

Can we have garbage collection and memory management on a fully-compiled language? That is, no VM or interpreter required?

Sure!

As others have also answered "yes", I thought I'd just add a couple of side notes.

First, there are many techniques for storage reclamation, and it's worth looking at them. We used reference counting at Parc in the 70s because we had real-time music and animation concerns on relatively small machines. However, when we did Squeak 20 years later, there were many more GC techniques to draw on that could handle real-time very well, and provide other services not easily done with reference counting. (We used a "generational GC with 'tenuring' " that constantly collected a little, and always provided "head-room" that allowed massively parallel real-time processing -- part of the idea was due to Dave Ungar.)

For speed freaks (actually so low on the priority list these (and most) days), but still, I thought it would be fun to mention a kind of "ultimate" storage management that Ned Irons (Edgar T. Irons) used for the first really practical extensible language IMP (CACM Jan 1970). This was done on a supercomputer of the day (CDC 6600) and what he wanted was no overhead on dynamic arrays compared to statically allocated arrays. He accomplished this by extending each array with space to hold back pointers to the loaded code that used the array address. The basic idea here is that the arrays would generally be much larger than the number of back pointers needed. Then when an array was extended/moved, he could then use the backpointers to fix-up the addresses in the code itself. Eeecks, but it worked quite well! It was essentially making "the linking-loader" dynamically active, etc.

How does imagery affect tone in literature?

This sounds like the "criticism" classes I was able to avoid in both English literature and in music. More worthwhile using the time to do more reading and playing! (Since most ideas -- and books -- are mediocre down to bad, it's a good idea to encounter lots of them to better the chances of "cosmic collisions".)

But you can also answer this question quite easily yourself just by looking at the official definitions of the terms.

"Tone" is basically attitudinal: formal, carefree, humorous, sarcastic, etc.

"Imagery" applies to sense-like descriptions: scenes, smells, sounds, etc.

Pick an attitude and then pick sensory examples that help embody it or are opposed. Pick a sensory example and then pick attitudes that go along or the opposite.

Good writers use lots of devices, but lots of devices don't make something worth reading. This is similar to the problem of sculpture: clay is completely malleable, but it can't simply be debugged into art.

Vision is needed.

Much better to start with searching out important ideas that are worth explaining to others, and then to find as many ways as possible to help them understand the importance.

I don't understand properly Alan Kay's quote about his math background. Can anyone explain it in layman terms?

Sometimes things can be nicely simplified yet made more powerful at the same time. This is often "what 'math' is good for".

When I looked at how Ivan Sutherland had organized the objects in Sketchpad -- (1962) the first real interactive graphics system (and much much more) -- one of the things he did was to have each object have a "place" called "display" and all you had to do to get any kind of object to display itself was to call the subroutine at "display" (which now became a generic label for the "idea of display").

This removed the need of the invoker to know how to display or to need to know the particularly name for its display routine. Each object was its own name-space, and Ivan realized (with the help of some ideas from Doug Ross and others) that "separating concerns" like this really simplified design and programming.

The idea of "least knowledge" is a very powerful way to think about making modules that can cooperate.

This is what "algebras" in math do. They started out as generalizations of things and operations like + and * and identities, etc. that could be used over a wide variety of things. This often works nicely because there are often many more kinds of things than there are powerful operations. And there are often many more ways to do things than good things to do (for example "sort" is a powerful idea, and there can be many dozens of different ways to sort, but the end result is the same. The differences are "pragmatic" rather than "semantic").

My inner hit was that "wow, we can make a really powerful but simple as possible interface language between objects, and we can hide the pragmatics inside of objects, etc."

Today this property of modules/objects is usually called "polymorphism" (a not quite accurate term, but I think Peter Wegner tagged it, and it has stuck).

What are proxemics in drama? How are they implemented?

I don't think I know anything special about this subject, beyond just the rules of thumb that are used on the stage to relate actors to each other spatially, and sometimes to "special objects".

For a given play and given actors, the same director can be very influenced by the size and kind of stage (proscenium, thrust, in the round, etc.) and the amount of rake and upstage available. And staging will normally be quite different for both movies and TV, where the visual angles are different, and cutting and closeups, etc., can be quite different because of the different visual angles available.

Staying with the theater, actors "need stage", so many scenes of close relations are sometimes staged with the actors further apart than they would be in real life, but with the use of voice, posing and gestures to create intimacy. As an aside, "posing" is a critical element of theater, where the distance from the audience and the desired projection of "generalizations beyond the characters" require much slower movement than real life, and something more like mime and tableau, to help the audience understand what is going on and being felt. I mentioned the important of "masks", in a previous answer, as another critical element to get beyond "people walking and talking".

A larger set of things to look into is the role of caricature in all arts. "What's there" is put there to help the audience make something special inside their heads, so much of the "art of Art" is choosing "what" to put "where".

What is the primary reason Willy Loman kills himself in *Death of a Salesman?*

One way to approach this is to look up the definition(s) of classical "tragedy" and see what a playwright might try to do today in a more inclusive society in which the larger bulk of humanity is attempted to be treated as important.

Then, you can ask the first question here -- and get a handle on the answer: what was Miller trying to accomplish in this play?

The next question might be: how successful was he for you?

My opinion doesn't matter. This is about you and your question. But just to slide my perspective into the conversation: I think Miller was more successful in raising the idea -- that any human life can be tragic -- than he was in making this particular play work.

To me, a much more interesting book (not a play) is "Man's Search For Meaning" by Viktor Frankl. I think it gets to more of the important matter than Miller does.

What were Sophocles' contributions to theatrical drama?

I thought the wikipedia article on Sophocles was pretty good -- I suggest perusing it.

The corresponding wikipedia article on Greek theater was less comprehensive -- and then there is the question of "theater" itself, and the means and purpose of the tragedy. A good book to look at is "The Magic Mirror" by George Jean Nathan. The title is one of the ways theater people think of what the theater is about: its purpose is to "beam the audience back out at themselves". Coleridge, who was a theatrical critic as well as a poet, once said "People go to bad theater hoping to forget, but they go to good theater tingling to remember". As with many arts, the aim is not so much to teach as to evoke.

Because of the strange -- and very "theatrical" -- ways our minds work, we are able to be drawn into a kind of waking dream that is real enough for tears, anger, excitement, and most other human emotions. Part of the "trick" is to find ways to get the audience out of "seeing people walking and talking" and instead have the audience enter into the world of the characters, where there are no actors, just the special reality we can make with our minds. One of the ways the Greeks accomplished this was through masks (it is worth using the Internet to find out about masks and how they and makeup and costumes have been used for thousands of years to help imaginations see what "needs to be seen"). It's important to know that Greek audiences could be frightened by some of the masks, and to realize what this means.

The closest normal stage process to Greek Theater today is the oratorio (e.g. Handel's Messiah). There are a few "soloists" -- usually no more than three (and Sophocles was credited for adding the third one) -- and a chorus. The level of actual singing is not completely known. (Both the "opera" and the "oratorio" were late 16th century inventions to try to revive something like Greek theater.)

My most direct personal experience with Sophocles was in the staging of Oedipus at Colonus. This was done as much like it is thought the Greeks might have done it. Besides the oratorio-like staging and delivery, the thing that worked to get all of us into it was to attend a number of Catholic Masses done old style in Latin with all the traditional touches. Why? Because Greek theater came out of the religious "mystery plays" that often had strong touches of the transcendental, even to the enhabitation by a god.

When everything is "working", the stylized declamations and almost no physical movement, coupled with the masks and music, create something like an other worldly mythic transcendent experience, more associated with religion and with art music today than with most theater. It is the opposite of "naturalistic".

In the production I helped with, we were lucky to have an actor with a truly enormous voice, and for him, all the stylizations fit him perfectly. His vocal impact turned out to be critical to help the audience get used to "tableaus" and "posing", etc.

I have since seen a number of Shakespearean productions -- mostly in the UK -- staged very similarly to what little we know about Greek theater practice. I found them especially effective in getting the audience to really listen to the words as the action.

Is a microservices architecture with RESTful APIs an implementation of Alan Kay's concept of object-oriented programming?

I was asked to write a history of Smalltalk for the ACM History of Programming Languages II conference in the early 90s -- an online version is: The Early History Of Smalltalk. This will make more clear some of the influences.

For our purposes here, while a grad student at an ARPA project in 1966, besides the catalyzing influences of Sketchpad, Simula, and Biology, I was interested in (a) the notion of "processes" in operating systems (they were sometimes called "virtual machines") which were rather similar to what are called "processes" or "tasks" today: i.e. encapsulated via MMUs, protected entities that acted like the entire hardware, and (b) by the starting to be talked about ARPAnet project, which proposed to use packet multiplexing.

The simple insight I had (just about 50 years ago now) was that you only needed "computers all the way down" -- because "a computer can do anything a computer can do".

I doubt that I was the first to have this insight (it's so simple -- almost tautological) but if not the first, then I think I might have been the most cosmically taken by the idea. It was so powerful in so many areas, especially because I was not of the psychological type that delights in "cleverly coping". It was kind of a "math insight" (one of my majors in college was pure math, specializing in algebras). Basically, semantics would be in terms of behaviors, and pragmatics could be quite orthogonal (and interpluggable).

This followed very strongly the simulation perspectives of Sketchpad and Simula -- if your basic element is "a computer" you can make it simulate anything, including basic elements. This fit strongly into another zeitgeist of the times: extensible "problem oriented" languages.

Combined with the "protected process" ideas, it seemed pretty clear that you didn't want a sender to be able to control a receiver. So you would start out with "requests" rather than "commands". However, because you could simulate anything, you could choose to have receivers interpret some requests as commands (and thus you could simulate old-style programming with data structures and assignments). But this was clearly a very bad idea because it violated encapsulation of a process. And this violated modularity -- whose whole point is to isolate and concentrate knowledge in a complex system.

A "request" oriented system is one made up of servers with services, and the servers can choose to serve or not. (Take a look at CAL-TSS ca 1969 by Butler Lampson for an example of a "capability OS" design.) Also note that there were several other OSs around or just happening (like Unix which was kind of going in 1971 and announced in 1973) that were intended to have lots of processes that could intercommunicate via messages that could be interpreted by the receivers.

Carl Hewitt's PLANNER was a set of ideas for "goal-oriented programming", and squinting the right way, one could see that this was a powerful way to think about "requests" (it was a real extension of the powers and problems one got with "interprocess communications").

I got interested in what all this meant for "language", and realized that in this kind of scheme, the syntax was merely a set of forms on the requests, and that the semantics would be supplied by the receivers. Most computer people are much too ad hoc (turns out we were also even though we were trying to be good), so this wonderful universe of degrees of freedoms needs to be given some unifying principles. This was again very much like coming up with "algebras" which unified a small number of ideas over many different specifics that were nonetheless similar in important ways.

I was also interested in the problems of definition, and that you always have the problems of "in terms of what?", and that "computers all the way down" was always a very good ultimate answer.

What remained was a lot of design probes and serious software engineering, to come up with something that was quite simple and completely comprehensive at the same time as being fast enough to do all the real-time user interface and other computing we wanted to do. (This had very little in common with later things called "object oriented programming" -- why didn't they just give it a different name? Guess ...)

For those who are interested, we resurrected one of these early systems from the 70s at Xerox Parc, literally from a disk pack that Xerox had thrown away. I used it for a tribute to Ted Nelson, and did all the visual materials for the talk with it. Alan Kay's tribute to Ted Nelson at "Intertwingled" Fest

Our view of all this stuff after the fact is that "real objects are a really good idea" (and that any kind of meaning on a computer has to be done via some form of interpretation, and this requires some form of process idea). We think that much better detailed designs than ours were are possible now. And we think that much of what exists now is much too messy and ad hoc to scale reasonably. I.e, we all need to do better!

For example, I really wanted to deal with the "coordinated state" problem by using some form of McCarthy's "fluents" and "system level transactions". We and others at Xerox Parc looked at parts of this but did not get them into our practical systems. This is still an important issue, and there is now known a lot more about how to do this at different scales: http://www.vpri.org/pdf/tr2011001_final_worlds.pdf

Now, to try to answer the question. First, it's always worth letting people make their own comparisons. What do you think?

Second, I would say that REST found some of the good principles of making systems, and missed some of the others. This is too bad, since it came later, and could have been much more informed by earlier good attempts and design and systems building. This is true of the web in general (which quite missed e.g. a number of Engelbart's much more powerful ideas, and many others as well).

Best wishes!

Will the code on p.13 of the LISP 1.5 Programmers Manual by John McCarthy and friends ever be taught in a physics class?

How about Biology in a Computer Science class? I think a way to make progress here is to try to appreciate what of the past can actually help thinking about the future.

For example, there is a lot to inspire in Lisp and in McCarthy's ideas -- and perhaps in Smalltalk also -- but there's no question that much more is needed (and that much more can be accomplished). This means that using these old "very interesting" systems as metaphors can help, but that being too slavish about the details will mislead.

Today we are in a situation where the most interesting computer artifact is the Internet, and much of its interest comes from how well it has scaled -- and much of the reason that it has scaled so well is that some of the principles of Biology were used in the design. Here we had to create "metaphors from the other direction" because the scaling of biology -- each of our bodies have trillions of cells, and each cell has billions of informationally interacting components -- is still immense compared to the piddling little computer systems we can make.

A worthwhile computer language and its artifacts to look at is Erlang -- which has much in common with the first Smalltalks at Xerox Parc, and has very little in common with the misnamed "object oriented" languages generally in use today. And there is a new even more interesting layer done for Erlang called Pony, which has more biological characteristics.

One thing to note is that as systems scale up it becomes harder and harder to centrally control them and harder and harder for the parts to interact in "gear-like" ways. A side-note is that it gets harder and harder to deal with various kinds of errors as well. Biology in contrast has "loose-coupling" at most levels of scale, and it also is set up so that many errors can be handled just by zapping (even if some of the things that get zapped are OK). Most computer systems by contrast are early-bound and tight-bound, and quite fragile to even simple errors and most kinds of zapping.

Psychologically this is a problem because a lot of learning computing is unfortunately learning to cope with what's not so good, and this quickly becomes a way of life, and then a "reality" that is hard to see past.

Who was specifically responsible for the invention of the "windows" GUI paradigm at Xerox PARC?

Mr Rao has a good slant on this, in particular to give the overall credit to the larger research community -- that of ARPA-IPTO. Many of us at Parc were graduate students under ARPA-IPTO and we considered Parc "another ARPA project" (just supported by Xerox because of the vanishing ARPA funding caused by the Mansfield Amendment).

By the time Parc came about (in late 1970), there were already GUIs within ARPA and elsewhere, and many of the elements we are familiar with today -- pointing devices, clipping windows and panes, buttons, some iconic, etc. -- were already in use in isolated systems. The overlapping window Parc GUI can take credit for being a comprehensive and general design, based on a large number of ideas, both theatrical and psychological. Most of the ideas in it were supplied by several of us in the Learning Research Group, including myself, Dan Ingalls, David Canfield Smith, Adele Goldberg, Larry Tesler, Ted Kaehler, etc.

Why isn't Alan Kay's FoNC (Fundamentals of New Computing) project more discussed or replicated?

I can't answer the question directly, but I can say a few words about the difficulties. The project came from long held feelings by many of us that there was much too much code for the actual meanings in many systems (by factors of hundreds and sometimes thousands).

We had a Smalltalk system (Squeak) that could do a lot in about 200,000 lines of code, and we thought that about 100,000 was not needed to sustain both a base (like an OS), an IDE, and an object system and interface that was like "apps" but much more integrated. Since the meta ideas in Smalltalk dated back to the late sixties and early 70s, and there had been some progress in meta-definition since then, we arbitrarily thought it would be fun to try for another factor of 5 for "vanilla personal computing": namely, something that could run on bare hardware that could sustain what most people did with personal computers (kind of the functionality of MS Office).

We got quite a bit done over the 5 years of the project and the yearly reports can be found on the Viewpoints website (Viewpoints Research Institute). Through some personnel and other issues, we didn't get the very bottom of the system to quite support the rest with a total of under 20,000 lines of code, so we didn't make our goal, and then the funding ran out. (A lot did get done in under 20,000.) However, we still have an itch to take another look, and there have been a few things invented since then that will help, especially at the lower levels.

It's worth pondering the difficulty of pinning down the goals here. In particular, it is difficult to express the metes and bounds of what parts of personal computing are being attempted and what parts of "operating systems" are trying to be subsumed. On the other hand, the goal is still a good one I think. 20,000 lines at 50 lines a page is about a 400 page book, and such a book, if readable, would be a great advance in expressibility over the typical cryptic forms that are hundreds and even thousands of times larger. But the previous sentence is merely a claim that needs to be carried out to have real force.