Einstein's 1905 Paper

Einstein 1905 Relativity

Report of Public Meeting held in Aberdeen University on March 21 2005

Einstein's 1905 Paper on Special Relativity

Graham Hall, Department of Mathematical Sciences, University of Aberdeen

1.Introduction.

Einstein's famous paper Zur Elektrodynamik Bewegter Körper (“On the Electrodynamics of Moving Bodies”) was completed towards the end of June, 1905, and published in volume 17 of the well-known Leipzig journal Annalen der Physik in the same year. This paper laid down the basis of the special theory of relativity. The title of the paper may excite a certain natural curiosity, but this is quickly removed in its first paragraph by Einstein's discussion of a series of experiments performed by the outstanding English natural philosopher, Michael Faraday, in the 1820s. (Faraday objected to being called a physicist) and also by the American physicist, Joseph Henry, at about the same time. This experiment involves a magnet and a length of wire, with the latter's ends connected to a galvanometer (to measure any current flowing in the wire). If the magnet is at rest in the laboratory and the coil in motion, or vice versa, a current is registered in the galvanometer. Experiment suggests that the current produced depends only on the relative speed of the magnet and coil, that is, the same current is observed for each of the two experiments. However, the standard explanation given to explain the first of these experiments differs from that for the second. Perhaps, more crucially, the theory necessary to explain this at the time of Einstein's paper was inconsistent.

This experiment and subsequent discussion contain the essence of Einstein's theory. Consider an observer in space, far removed from any gravitational fields, and who is not “accelerating”. Not accelerating with respect to what, the reader might ask, but let us leave this embarrassing question until another time! Let us, for the time being, say that our observer experiences none of the accelerative (sometimes called inertial or even “fictitious”) forces usually associated with acceleration, such as those felt in an accelerating or decelerating vehicle or those “centrifugal” forces experienced on those rotating contraptions sometimes seen at fairgrounds. We will call such an observer an inertial observer. Any other observer whose motion with respect to our original observer can be described as uniform motion (motion in a straight line with constant speed or, more concisely, constant velocity) will also be called inertial. These are intended to include those observers, spatially displaced from our original one, but at rest with respect to it, by allowing the above speed to be zero. Those inertial observers such as are described here will constitute the totality of inertial observers. It is, perhaps, interesting to note, in passing, here that this terminology is that of classical theory (that is, of Newton's theory) and that this theory claimed the ability to distinguish “real” forces from “fictitious” ones and, as a consequence, the ability to distinguish inertial observers from non-inertial observers (and vice versa). This point will not be discussed any further here. It belongs more properly to the general theory of relativity which Einstein did not publish until 1916 and which claims no such ability to distinguish in either of the above senses.

In the later stages of the 19th century it was believed that no experiment could distinguish one of these inertial observers from another. In this sense there was no inertial observer who could claim to be special and hence no observer who could claim to be the “standard of rest”. Thus there was no “absolute” observer. For example if O and O’ are two inertial observers in relative motion with non-zero speed, O and O’ would each observe Newton's laws to be true in the sense that, for example, if each were to project a particle from a certain point with the same initial velocity (the same for O and O’), the description given of the particle's motion would be the same in O and O’. This assumption, which applied to dynamical experiments only (since such were really the only ones investigated up to this time), is usually called the Newtonian Principle of Relativity.

In the mid 1860s, the great Scottish physicist, James Clark Maxwell, finalised the electromagnetic theory by writing down the equations which now bear his name and which describe, mathematically, the electric and magnetic fields arising from a certain distribution of electric charges and currents. Maxwell's equations describe electromagnetic phenomena with uncanny accuracy and were regarded, along with Newton's laws, as two of the cornerstones of theoretical physics. In Maxwell's equations there occurred a curious constant, written “c”, which arose naturally from certain constants arising from the electric and magnetic fields and which is fundamental to the theory. Maxwell was able to use his theory to suggest that light was a type of electromagnetic field and that c was its speed. This suggestion was confirmed in brilliant experiments carried out by the German physicist Heinrich Hertz in 1888 (and even earlier by the English physicist, Hughes, who, apparently, never published his work). So c is the speed of light; but with respect to which observer? This is an important point since the value of c was essentially only known from earth-bound laboratory experiments to find the above mentioned electric and magnetic constants and similarly restricted experiments to find the speed of light. Hence, while it was reasonably clear that the speed of light was important in Maxwell's equations, there was little direct evidence to decide whether this speed should be observer dependent or observer independent, other than appeals to aesthetics or mathematical elegance.

Nineteenth century physicists usually required a medium through which waves could travel, the necessity of which is easily confirmed in the case of sound waves. If light was an (electromagnetic) wave (and in many respects it had wavelike properties), a medium was required for its propagation. Since light appeared to travel to us from the sun and stars through what was believed to be essentially a vacuum, a new medium was needed to propagate it. Thus the ubiquitous “ether” was invented and any inertial observer O who had the good fortune to be at rest with respect to this ether would see light moving with speed c no matter from which direction it came. Any other inertial observer moving with non-zero speed with respect to O would then measure a speed of light which depended on this speed and the direction from which the light came, and was to be calculated in the usual way using the standard elementary relative velocity formulae.

But it is important to realise that the ether gave physicists something else; it gave them an “absolute” reference frame (that is, a privileged observer) in the universe and which was at rest with respect to the ether. Now, inertial observers could be distinguished one from another not by “mechanical” experiments with ordinary particles, but by an electromagnetic one, that is, the simple expedient of measuring the velocity of the ether (the “ether wind”) in a particular observer's frame. The negative of this velocity then gave the velocity of the observer through the ether. Thus the electromagnetic theory suggested an absolute reference frame whereas Newton's did not. It was Maxwell's suggestion to use this approach in order to identify the earth's frame by measuring the earth's “ether wind”. The theoretical basis for this experiment is a straightforward application of Newtonian relative velocity formulae but the practical aspects of it were difficult, since a certain, very small, quantity had to be measured and which seemed beyond the technology of the day. On hearing this, the Polish-American physicist, Albert Michelson, took up Maxwell's challenge. At first alone, in 1881, and then in tandem with the chemist Edward Morley, in 1887, he invented the technology required (the Michelson interferometer) and devised an experiment in the spirit of Maxwell. The remarkable conclusion to these and other similar experiments was that no ether wind was detected, that is, the movement of the earth through the ether did not show up in their experiments. Of course, there were several ingenious attempts to explain away this conclusion but, to these, there were always cleverer counter arguments. After many repetitions of the experiment, it became clear that its result was reliable and that physicists had a serious problem on their hands. One sad aspect of this was that Maxwell's early death in 1879 meant that he was not to see the results of Hertz and Michelson and thus the far-reaching consequences of his work.

2. Special Relativity

Let us summarise what has been achieved so far. One has Newton's theory, which had achieved immense success since its inception in the late 17th century and which had accurately described the solar system and a host of other astronomical and mechanical problems, including the successful mathematical prediction of the existence of the planet Neptune in about 1846 by the Englishman Couch Adams and the Frenchman Leverrier. Then, later, Maxwell's electromagnetic theory arrived with similar success. But when they are put together in a simple theoretical calculation required for the Michelson-Morley experiment, a contradiction appeared to ensue. Such was the trust held in the work of Newton and Maxwell that people were reluctant to change either and many suggestions were put forward to rectify the situation and successfully complete the liaison between these theories. But these were mostly ad hoc and in the very early years of the 20th century several mathematical physicists, in particular, Poincaré, Lorentz and Larmor had new ideas, suggesting that a successful solution to the problem was close. It came eventually in 1905, not from a university academic, but from a young German scientist working full-time in the patent office in the Swiss capital Berne. He was born in the same year that Maxwell died and his name was Albert Einstein.

In his 1905 paper, Einstein proceeded in a logical fashion by first clarifying the idea of a frame of reference as an inertial observer equipped with a means of describing where and when a certain event occurs. Thus each observer introduces a 4-dimensional coordinate system such that he can pinpoint any event in the universe by four numbers (coordinates), the first three indicating where the event took place and the fourth indicating when it took place. There is nothing novel in this, since a similar coordinatisation for solving mechanical problems in Newtonian theory had been employed by theoretical physicists for many years. The observer introduces, for example, the usual Cartesian coordinates x, y, z for the “spatial” location of the event and a coordinate t for the time of its happening. The x, y and z axes should be calibrated with the same unit of length and this unit should then be used by all observers. The space described by the x, y, and z coordinates is the usual Euclidean space for each observer. The time coordinate should be thought of as arising from the placing of “good clocks” one at each point in space and at rest with respect to the observer in question. Here, a “good” clock is one that is calibrated to record time in a certain unit, say seconds, and this unit should then be used by all observers. These good clocks must also be synchronised and it is here that Einstein introduces new physics. It is assumed that the clocks can be synchronised in such a way that if A and B are any two clocks (at rest) in the frame of any observer O and a distance d apart and a light particle (a “photon”) passes A in the direction AB, when A reads time t, then it will pass B when B reads time t+d/c where c is a constant (which will then become, by its very definition, the speed of light). It should be stressed here that the constant c is assumed to be completely independent of the observer. With such a coordinate system, each inertial observer is equipped with the means of describing the motion in space and the progress in time of any moving body and by the very way the coordinatisation has been set up any observer can now measure speeds and will find light to always have speed c. Einstein next assumes that each of these observers will observe Newton's laws to be satisfied. Then he introduces what is known as the Einstein Principle of Relativity which is that no experiment whatsoever can distinguish one inertial frame from another. (In fact, this principle was actually first enunciated by Poincaré in 1899 and 1904.) This assumption immediately clashes with the established thinking because it applies to both Newton and Maxwell theory whereas the Newtonian relativity principle applied only to dynamical phenomena (that is, to Newton's theory). However, since it cannot be denied that a purely dynamical experiment is difficult to imagine, Einstein's relativity principle has greater appeal. Einstein's other assumption is the one made above in the synchronisation of the clocks and results in the consequence that each of these inertial observers, using their space and time coordinates, will measure the same speed of light (equal to the constant c in Maxwell's theory) irrespective of the direction of the light. This second feature is perhaps the hardest to take on board since it flagrantly contradicts the usual (intuitive) calculation of velocity using Newton's relative velocity formula. An inertial observer moving head on into a light beam would expect to register an increased speed of light, but Einstein's postulate says that the same speed c will be measured. Einstein's assumption that such a synchronisation is possible is a tacit realisation of the failure of the Michelson-Morley experiment to detect the ether wind, but it could also be regarded, more elegantly, as arising from applying his relativity principle to electromagnetic theory (and remembering how the constant c arises in Maxwell's theory).

The next step involves the finding of a formula which will relate the coordinates x, y, z, t, of an event as registered by one inertial observer, say O, with the coordinates x', y', z', t', of the same event as recorded by another observer O’. These formulae in Newton's theory (the “Galilean transformations”) lead to the standard relative velocity rule in that theory. Since it was explained above how this velocity rule cannot now be the case, it is clear that a replacement must be sought for the Galilean transformations. These are the “Lorentz transformations” and are a little more complicated than their Newtonian counterparts. These transformations can be shown to be consistent with the speed of light being the same in each inertial frame. They were known to earlier workers such as Voigts, Poincaré, Lorentz and Larmor.

The remainder of special relativity may, not inaccurately, be described as the theory of the Lorentz transformations. It is not difficult mathematically, but it leads to the destruction of a certain number of familiar concepts, since these concepts in the old theory were a consequence of the Galilean transformations. One of these unusual features manifests itself in a denial of what in Newton's theory is called (Newtonian) absolute time; the idea that a universal time coordinate can be given for all observers. To amplify this, we recall how an observer was “coordinatised” and, in particular, how he must only use his own clocks (which are at rest in his reference frame) and space coordinates. In Newton's scheme, the same is true except that it is now assumed that the clocks for the various observers may be calibrated and synchronised in such a way that the time given to any event is the same for all observers (that is, for all clocks). Newton's absolute time is intuitively rather nice but the assumptions made in Einstein's theory turn out to be inconsistent with it and, in the latter theory, each observer must use his own time coordinate. This “relative” time is crucial to the theory and is responsible for the breakdown (as mentioned above) of many familiar intuitive concepts instilled in us from Newton. Thus Newton's absolute time is rejected in Einstein's theory. It should, however, be pointed out that Einstein's special relativity theory does nothing to remove Newton's concept of absolute space, the latter being a constant point of attack by philosophers and a source of unease for physicists. This is essentially the problem raised earlier when the question, with respect to what is an observer accelerating, was raised.

Thus an inertial observer O in special relativity has a coordinate system and a set of good clocks (at rest relative to O) for measuring the time coordinate. Another such observer O’ is similarly equipped. Then O’s clocks will be in constant uniform motion with respect to O’ and vice versa (unless O and O’ are mutually at rest). The relativity of time leads to the following “problem”. Suppose O considers two events to be simultaneous, that is, they have the same time coordinate in his frame. If we are in Newtonian theory, O’ would also regard them as simultaneous since O’ may use O’s clocks (as a consequence of Newton's absolute time or, if you prefer, the Galilean transformations). However, in special relativity, O’ must use his own clocks and there is no reason to suppose that O’ will record these events as simultaneous. In fact, we can calculate the time coordinates of these two events from the Lorentz transformations and, in general, they will not be found simultaneous. This turns out to be a perfectly natural consequence of the way the coordinates are set up and which respects the constancy of the speed of light. Simultaneity is now relative to an observer and not a universal statement as in Newton's theory

The fact that simultaneity is relative rather than universal accounts for the clash between intuition and special relativity. It forces other concepts, which are based on it, to assume a relative nature. Suppose our inertial observer O wishes to measure the length of a straight rod which is at rest in his reference frame. (There is no problem with the word straight here since we have agreed that O’s “space” is Euclidean.) He would simply record the endpoint coordinates of the rod and proceed accordingly. Now suppose that O’ is another inertial observer wishing to measure the length of the same rod. He will, of course, see it moving with constant velocity (the negative of the velocity that O’ has with respect to O). Now for O’ to measure the length of the rod he must be careful (because of the rod's motion with respect to his frame) to take simultaneous readings of the rod's endpoint coordinates before he proceeds to calculate the length. (This was, of course, unnecessary for O since the rod was at rest in O). But as we mentioned earlier, simultaneity is a relative concept, and it will possibly come as no surprise to find out that by using this common sense arrangement, the length of the rod, as measured by O’, will, in general, disagree with that obtained by O. Of course, the Lorentz transformations are required to calculate the details and it turns out that O’ will record a shorter length than O. In this sense, moving rods shorten (and the rod has its greatest measured length in that frame in which it is at rest). So just as simultaneity is relative, (that is observer dependent), so is length. (This is the well-known Lorentz-Fitzgerald length contraction phenomenon; but not quite in the way they originally imagined it!). The actual amount by which the rod appears to shorten depends on the speed of the rod and the angle between the rods motion in O and the rod itself. However, there are two remarks to be made here. First, this is not, in any sense, a physical shrinking of the rod. It is a purely kinematical effect occasioned by the way its length is measured. Second, it is a common error to suppose that one would always see such a shortening of bodies in motion. What one actually sees is a complicated combination of this relativistic effect together with the accompanying fact that viewing such a rod requires observing photons from the rod's extremities which reach the observer's eyes (or his camera) simultaneously. These photons will, in general, not have left the rod simultaneously with respect to the observer and in the time difference between their leaving, the rod will move with respect to the observer and the apparent length will change. This latter effect, which is present in Newtonian theory just as much as in special relativity, could make the rod appear shorter or longer and may swamp the relativistic contraction effect.

A similar calculation can be done to show that if O observes a clock moving with uniform motion he will perceive it to run slow with respect to his (O’s ) clocks. To be more precise, let C be a clock at rest relative to O’ (so that C is moving with respect to O). If O records the two events consisting of two ticks of C, one second apart according to the dial on C, and notes the times on the two clocks each at rest in his frame and which are adjacent to C at the instants of these ticks, then these latter two times, when subtracted to give the time difference between them in O, will record more than one second. Thus C runs slow as measured by O. This is the so-called time dilation phenomenon.

These two effects are perhaps the most dramatic of what might be called relativistic kinematics (that is the relativistic theory of pure motion). But if we return to another feature of our inertial frames, that Newton's laws must hold good in them, another effect, this time dynamical, arises. It turns out that the mass of a particle is also “relative” (as had been suggested earlier by Poincaré and Lorentz), in the sense that it depends on its speed (but not on its velocity since space, being Euclidean, should have no preferential direction). In fact, it turns out that the mass of a particle increases as its speed increases and is thus least in that frame in which it is observed to be at rest. This latter mass is called the “rest mass” of the particle. Einstein deduced this intriguing result by first showing how the electric and magnetic fields change from one inertial frame to another and then using the known result concerning how such fields affect the motion of a charged particle when the latter's charge and velocity are given.

The necessity of using the Lorentz transformations rather than the Galilean ones leads to a new formula for computing relative velocities. It is necessarily different from the usual Newtonian one (because of the peculiarity of the speed of light) and it is reassuring to learn that it is consistent with it. The Lorentz transformations also lead to straightforward modifications to the well-known aberration and Doppler formulae, so important in astronomy and cosmology.

All the features so far mentioned are dealt with in the famous 1905 paper of Einstein. This paper also contains details of how Maxwell's equations and the dynamics of charged particles are accommodated within the theory. In another, shorter, paper in the same volume of the same journal, Einstein first uncovered his “E=mc²” equation; arguably the most celebrated equation in physics. This equation takes us into quantum theory and has a story all of its own. It should also perhaps be added that Einstein's work was greatly enhanced, geometrically, by the work of Hermann Minkowski, in 1908, who cast special relativity into an elegant 4-dimensional form and revealed a rather beautiful mathematical structure for Maxwell's theory and equations. The elegant modern formulation of relativistic mechanics is largely the work of Minkowski augmented by that of Max Planck in 1906.

Three brief remarks can now be made to complete our brief discussion of special relativity. First, special relativity has no need of an ether. It does not declare that the ether does not exist but rather says that, if it does, it seems to be unobservable, and physics only deals with observables. Thus the ether is abandoned as irrelevant. Second, the feature that the speed of light is the same for all inertial observers is peculiar to the light speed and does not apply to “other speeds”. In this sense, the speed of light is “special” and this special nature reappears in the fact that the Lorentz transformations suggest that material particles and observers are restricted to speeds less than that of light. Third, the Newtonian concept of absolute space together with all its problems is present as much in special relativity as in Newtonian theory. Further discussion of this point had to wait until Einstein's general relativity theory which he published in 1916.

The special theory of relativity is now 100 years old. It has been confirmed in numerous experiments and is universally accepted. It is basic now in many branches of what might be termed “high speed” (or “high energy”) physics. The reason for the “high speed” restriction is straightforward. The Lorentz transformations, which control special relativity, and the Galilean transformations, which play a similar role in Newton's theory, are approximately the same between observers whose relative speed is small compared with the speed of light. Thus, at such speeds, the two theories are almost identical (hence the time it took for special relativity to be discovered). Only when dealing with relative speeds close to the speed of light do the two theories diverge significantly. Also, special relativity is required in dealing with high energy physics and this is where the equation E=mc² plays a crucial role.

Home Page
"Crisis, what crisis?"
"Experimental Tests of Special Relativity"