Abstract. In this article we will be presenting an overview of the mathematical formulation and foundations of General Relativity. Basic objects such as the metric tensor and the connection are introduced, and given a geometric interpretation. The structure and meaning of the Einstein Field Equations will be discussed in more detail, a recipe for solving them will be presented, and the calculation of geodesics will be explained.
General Relativity, at its core, is a mathematical model that describes the relationship between events in space-time; the basic finding of the theory is that the relationship between events in the same as the relationship between points on a manifold with curvature, and the geometry of that manifold is determined by sources of energy-momentum and their distribution in space-time. Gravity then becomes the deviation of the geodesics. Therefore, the natural language to describe space-time and gravity is the one of differential geometry on manifolds. This discipline of mathematics was first studied by Monge and Gauss in the late 18th and early 19th century, and then further developed by Bernhard Riemann. It was then given a mathematically precise foundation by Elié Cartan in the early 20th century ( differential forms and connections ), and made use of extensively by Einstein when he developed his theory of relativity.
For the specific application of General Relativity, the most important and most fundamental ( physically speaking ) object is the metric tensor. This object allows us to consistently define the separation of neighbouring events in space-time, i.e. their spatial distances and separation in time. It does this by breaking down the concept of “distance” along curves into infinitesimal increments, like so :
In a 4-dimensional space-time, we can make this general by further breaking down dr into increments taken in each individual direction in space and time; this is in essence just a generalisation of the well-known Pythagorean theorem  :
Likewise, in space-time we express dr as the sum of increments in each direction :
Note that unlike is the case for the simple Euclidean geometry you would be familiar with from your school days, the geometry of space-time is Lorentzian, which means that the time-like direction has opposite sign from the spatial direction – hence the minus in front of dt. This is so because the separation between events defined by (1) needs to be the same no matter which observer looks at it – it is the minus in front of dt which allows this to be the case.
Of course (1) can be written in terms of any system of coordinates we choose; it doesn’t have to be Cartesian coordinates. For example, the distance between points on the surface of a sphere in spherical coordinates is
Clearly, (1) is a very general statement which holds true in all coordinate systems – all that changes is the coefficients in front of our increments in the various directions. Let us put these coefficients away into a matrix to write things a little more cleanly; for now, this matrix has only one column. Let us further denote to be some general set of coordinates – that could be Cartesian coordinates, or Cartesian ones, or any other system :
So we have neatly put away the coefficients that “adjust” things for our chosen set of coordinates into the matrix . So far so good. There is only one issue here – we have tacitly assumed that the total separation in our general Pythagorean theorem (1) does not depend on any mixed elements of the form dtdx or dydz or similar; that is a reasonable assumption in Euclidean geometry, but we wish to find the most general definition of “distance” or “separation”, which will work for all possible geometries. For the most general case, we must permit mixed components as well; that means we have to extend our coefficient matrix into a n x n matrix, and write (1) as
This is still only a formula for the distance between points along the lines of Pythagoras, but written in a more general form that is valid for all coordinate systems, and any number of dimensions; the details are given by the coefficients tucked away in the matrix . This collection of coefficients, which can be written in matrix form, is called the metric tensor, or simply the metric, and expression (4) is called the line element. The line element gives the total separation between neighbouring points, and the metric tensor gives the information about the geometry in between these points, meaning it tells us how distances change with increments taken into the various directions. Because the sum symbols are unwieldy to write, we make use of the Einstein summation convention, and demand that, if we find the same index appearing both “up” and “down” within the same term, summation over all its values is implied :
Metric. The metric tensor tells us how distances follow from increments taken into the various directions; it is a function that takes as input two vectors, and produces as a result their inner product. The line element is the sum of all possible inner products along basis vectors, and gives the separation between two neighbouring points ( Pythagoras ).
Take careful note that these increments and separations we speak of are infinitesimal – they apply to neighbouring points on a manifold. If we have some extended curve C on a given manifold, and we wish to calculate the total length of that curve, we must perform an integration of the line element along C :
To actually evaluate such a line integral, simply proceed in the same manner as you have learned in your calculus courses – find a suitable parametrisation of your curve C in your chosen coordinate system, then plug everything into the integral (6) and evaluate. A simple example of a line integral is found here.
The metric allows us to do more than just define simple distances along curves; it also allows us to define areas, volumes, angles, and in general everything that can be deduced from the notion of inner product. In most general terms, the metric allows us to define measurements on a manifold; it is the vehicle that connects abstract geometry with actual quantifiable measurements.
The line element (5) has interesting and important characteristic – to see this, consider once again (2). This is the separation of points on the surface of a sphere, written with spherical coordinates. However, there is nothing stopping us from expressing that same distance using different coordinate systems – we can stay on the surface of a sphere, but decide to use Cartesian coordinates instead, or elliptical coordinates, or polar coordinates, or any other system. If we do this, the components of will all change, but the overall line element ds will not. This just means we can choose to give points on our surface different “labels” ( coordinates ), but the separation between them will not be affected by this – we are just giving different “names” to the same points. Mathematically, this is possible because the relationships between the various components of remain unchanged, even if the components themselves change when we choose a different coordinate system. This is a general property of all tensors – they are objects that remain overall unchanged if we pick a different coordinate system. In other words, they are objects that are independent from the observer. This symmetry – the invariance under changes in coordinate basis – is called diffeomorphism invariance. A physical law expressed with tensors will retain the same form in all reference frames, regardless of the observer’s state of relative motion, or position within a gravitational field. The property of a law of physics to be the same in all reference frames is called general covariance.
General Covariance. Physical laws expressed in terms of tensors retain the same form, regardless of the observer’s state of relative motion, and position with respect to sources of gravity. The term “co-variance” here means that the tensorial quantities in the expressions “co-vary” along with any changes in coordinate basis, meaning they remain the same overall, both in form and in meaning.
Now that we know how to express distances between events in space-time, we can use the metric tensor to define “higher level” concepts and quantities to help us formulate laws of physics in our space-time. Most notably, we can define quantities which characterise certain aspects of the geometry of space-time itself, such as its curvature. Before we do this, however, we need to ask ourselves just how a curved manifold differs from a flat one, in an intuitive way.
To answer this, consider the meaning of the operation of “differentiation” in geometric terms – when we differentiate at a point on a manifold, we actually put a tangent vector onto the manifold, the origin of which is the point in question. The set of all possible tangent vectors at a given point is called the tangent space at that point, which is generally a plane that is made up of all possible tangents. Now, if our manifold itself is a flat plane, then the tangent space is the same one at all points, and just coincides with the manifold itself. However, the same is not true if the manifold is more non-trivial, for example the surface of a sphere – here, the tangent spaces at different points no longer coincide  :
This is going to cause issues when we perform differentiation, because the result of the operation now suddenly depends on where we perform it, since the tangent space isn’t the same everywhere. This is unsatisfactory, so, in order to fix this, we define a new concept of differentiation which accounts for the “changes” that happen on the manifold from point to point, and hence can be applied anywhere without having to worry about the above issues. In order to define such a notion, we need a way to connect tangent spaces at different points, i.e. we need some kind of mathematical object that is able to “keep track” of changes when going from one point to another. This object is ( unsurprisingly ) called the connection, and it is formally represented by a matrix of coefficient functions called the connection coefficients, or Christoffel symbols :
In actual fact, the above doesn’t represent the most general formula for just any connection, but rather it is the expression for a very specific connection, called the Levi-Civita connection – this is the connection we use in General Relativity. It has two very distinct properties; firstly, it is symmetric under exchange of lower indices :
In technical terms, a connection that satisfies (8) is called torsion free. Secondly, the connection is defined such that if you parallel-transport a tangent vector on your manifold, it remains a tangent vector at all points, like so  :
That means that the inner product between tangent vectors is preserved everywhere – we say that the connection therefore preserves the metric and is hence metric compatible. These two conditions uniquely provide a natural choice of connection, which is just precisely the Levi-Civita connection (7); the Christoffel symbols for this connection depend only on the metric and its derivatives.
Connection. The connection provides a method to connect tangent spaces at different points on a manifold in such a way that all “changes” that occur between those points are accounted for. In General Relativity, we use the Levi-Civita connection, which is torsion-free and preserves the metric.
Using the connection, we can now define a new concept of differentiation, which is called the covariant derivative; it is denoted by a capital “D”, or double bars “||”, and defined as :
The covariant derivative can be similarly defined for any tensor of any rank; for example, for rank-2 tensors, it is
In the above expressions, we are taking the covariant derivative with respect to the k-th coordinate, and a single bar “|” denotes the ordinary derivative. The result of covariant differentiation is a tensor, but the Christoffel symbols alone are not tensors; bear this in mind. Basically, what happens here is that covariant differentiation is made up of the “normal” ordinary derivative, plus some extra terms involving the connection – it is those extra terms which compensate for whatever changes occur on our manifold from point to point.
Covariant Derivative. This is a generalisation of the ordinary derivative to manifolds which are not flat and Euclidean; it provides a consistent concept of differentiation applicable on arbitrary manifolds.
But we are not done yet. What we have done so far does not yet allow us to draw any conclusions as to the curvature of our manifold; to see why, consider the case of a 2-dimensional cylinder embedded in 3-dimensional space, compared to a sphere  :
In both cases, the tangent spaces appear to vary from place to place, so, based on what I have written above, you might arrive at the conclusion that both surfaces possess curvature, since ( depending on your chosen coordinates ) you will get non-zero Christoffel symbols for both of them. However, this is not actually true – as it turns out, the cylinder is in fact flat : you can simply cut it open lengthwise and “uncurl” it into a flat plane ! This isn’t possible for the sphere, so there must by something fundamentally different between these two surfaces, something we can’t immediately see in the connection coefficients ( which, since they are not tensors, inherently depend on your chosen coordinate basis, anyway ).
Clearly, we need to find some way to actually quantify the notion of curvature, and make it mathematically precise – the Christoffel symbols alone don’t quite do the trick just yet. The key to understanding the intrinsic difference between the two surfaces in the above graphic is to consider what happens when we take a tangent vector anywhere on each of them, and parallel-transport it along a closed path, so that we get back to our original point. On the cylinder  :
While not explicitly shown ( sorry, I couldn’t find an appropriate graphic :/ ), if you were to complete the path so that it is closed, you will find that the resulting vector will perfectly coincide with the original vector you started off with, no matter which path you choose. You can do this experiment yourself at home – just roll a sheet of paper into a cylinder, and “parallel-transport” something representing a vector along the surface. In the end, it will always have the same orientation as when you started off, no matter the path you took. However, if we perform the same operation on a sphere, we will quickly find that this no longer holds true  :
The above shows two different paths connecting points p and q – a straightforward path B, and something a little more involved, path C = C1 + C2. Depending on which path you take, you end up with two different tangent vectors at point q ! The difference between a cylinder and a sphere is hence that, if you parallel-transport a vector along a path on a cylinder, you end up again with the original vector, whereas on a sphere you get a different result depending on along which path you perform the parallel-transport. And this is precisely the conceptual definition of curvature :
Curvature. Curvature measures the failure of vectors parallel-transported along closed curves to coincide with the original vectors before parallel-transport.
As such, it is an intrinsic property of manifolds, and has nothing at all to do with how it is embedded into a higher-dimensional space – the embedding of the cylinder above looks like it is curved, but in fact it is a flat surface, as evidenced by the behaviour of vectors under parallel-transport. The definition of curvature makes no reference to embeddings, and does not actually require the manifold to be embedded into any other space. It is purely intrinsic, and completely determined by measurements taken within the manifold itself. The same is true for space-time in General Relativity – it is not assumed or required that this is embedded into anything; all curvature is an intrinsic property.
To quantify this, we translate the above definition into mathematical language – we take a vector A, covariantly differentiate it twice along two different coordinates, then do the same with the sequence reversed, and see what the difference is :
So, the commutation of covariant derivatives ( which is the meaning of curvature ) is equivalent to a rank-4 tensor, called the Riemann curvature tensor. It quantifies the failure of the covariant derivative to compute, or equivalently, the path-dependence of parallel-transporting vectors. If you expand (11) and collect the terms, you will find that all ordinary vector derivatives cancel out, and the Riemann tensor is a function purely of the connection :
It is important to fully understand what we have done here – just looking at the metric alone is not enough to tell whether a manifold is curved or not. Even just looking at the connection isn’t enough yet. We have to go and actually study the effect a given connection has on parallel-transport of vectors to adequately define “curvature”. Therefore, curvature does not technically arise from a metric, but it arises from a connection defined on a manifold. In standard General Relativity it so happens that everything is computed directly from the metric, and the metric alone, but this is only because we are using a very specific connection, the Levi-Civita connection; it may not be true in other, more general cases, such as alternative theories of gravity.
Curvature. Curvature arises from the connection; in General Relativity, the connection is the Levi-Civita connection, and as such the curvature tensors can all be computed from the metric.
The Riemann curvature tensor is the one object which contains all information as to the geometry of space-time, because it encapsulates all possible combinations of transporting vector components along all directions. However, this object does not actually explicitly appear in the field equations; we find instead other curvature tensors, such as the Einstein tensor, the Ricci tensor, and the Ricci scalar – these objects are all derived from the full Riemann tensor, and represent specific aspects of local curvature. If you haven’t already done so, I urge you to read my article Manifolds and Curvature to understand the geometric significance of all these objects.
The one place in General Relativity where the Riemann tensor does explicitly appear is when we want to compute what happens to the world lines of test particles under the influence of gravity. Say we pick a fiducial geodesic of some test particle, and a second test particle alongside it, and we want to examine how the separation between the two changes as they age into the future  :
The separation vector evolves according to the differential equation
This is called the geodesic deviation equation, and it tells us how world lines of particles in free fall behave with respect to the world line of some other reference particle. In other words, it tells us the relative acceleration of test particles. This system of four differential equations ( one for each component of the separation vector ) contains both the Riemann tensor as well as the covariant derivative; it can be cumbersome to solve this in closed analytical form, but for most simple metrics it can be done fairly straightforwardly. This equation is of central importance in General Relativity, because it provides both a way to explicitly calculate the effects of gravity, as well as an intuitive definition for the Riemann curvature tensor – put in a tangent vector ( your reference geodesic ), and a separation vector, and out comes the rate at which the two geodesics begin to deviate as they age into the future.
Curvature. The geodesic deviation equation allows us to compute the behaviour of test particles under the influence of gravity. The relative acceleration between test particles is a function of the Riemann tensor, i.e. of the curvature of space-time.
So as you can see, pretty much everything to do with the geometry of space-time and the dynamics of test particles within it emerge in some way or another from the metric. But how do we know the metric in the first instance ? This is where the Einstein Field Equations ( EFE ) come in – they provide a constraint as to what form the metric of space-time can take in the presence of energy-momentum distributions. In their simplest form, the EFE reads
meaning that knowledge of the energy-momentum tensor is equivalent to knowing the average scalar curvature in each direction of space-time. Written like this, the meaning of these equations may not be terribly intuitive to the average student of General Relativity; in order to make things a little clearer, we can re-write the EFE in the following way :
This is the same equation as (14), even though it might not look like it at first glance. In any case, this formulation allows us to understand the EFE better – the geometrical meaning of the Ricci tensor on the left is the rate at which a ball of test particles begins to change volume V as it freely falls; the meaning of the right hand side turns out to be the sum of the energy density at the center of the ball, plus the pressures in the spatial directions. Thus we can represent the full EFE in a simpler manner  :
Note that some constants have been omitted here for clarity. This puts us in a position to state the Einstein Field Equations in plain language  :
Einstein Field Equations. Given a small ball of freely falling test particles initially at rest with respect to each other, the rate at which it begins to shrink is proportional to its volume times: the energy density at the center of the ball, plus the pressure in the x direction at that point, plus the pressure in the y direction, plus the pressure in the z direction.
And this is all there is to it – this is what the field equations mean. If the ball of test particles is in the interior of an energy-momentum distribution ( a planet, an EM field etc etc ), then its volume will begin to change according to (16) as time passes. In exterior vacuum, where there is no local energy-momentum, the volume of our ball of test particles remains constant over time. The EFE therefore represents a constraint on the change in volume of balls of test particles under the influence of gravity; because the rate of change in volume ( the Ricci tensor ) is a function of the metric and its derivatives, it also represents a constraint on the metric itself. Hence, the significance of the EFE is that it allows us to compute the metric of space-time from a given distribution of energy-momentum, plus some boundary conditions. Why do we need boundary conditions ? Well, the EFE is a system of partial differential equations, one for each independent component of the metric – and in order to solve differential equations, we always need to impose boundary conditions. Physically, the EFE merely constraints the rate of change of volume of our test particle ball, but it has nothing to say about how the shape of the ball evolves – we need to “pass” additional information in order to obtain an overall view on what happens to the ball.
The system of equations (14) has no general solution – in the worst case scenario, we are dealing with a system of 16 non-linear, coupled, partial differential equations, one for each component of the metric tensor. Mathematically speaking, this is pretty much as bad as it gets, so far as finding closed analytical solutions is concerned. The best we can do is simplify things as much as we possibly can, by exploiting all available physical and mathematical symmetries of the problem at hand.
Take as an example an astrophysical body such as a planet. In the real world, planets aren’t isolated – they are parts of solar systems, and move under the influence of their central star, as well as other planets in the vicinity. They are bombarded by smaller bodies, they carry magnetic fields, they spin around their axis of rotation, and they move through radiation fields ( solar wind ). They are also not perfectly spherical. If we wanted to accurately and precisely account for all of this, then we haven’t a hope of ever finding an exact, closed analytical solution to (14). That doesn’t mean such a solution doesn’t exist – it does, but realistically we do not have the mathematical tools to find it. It can be done numerically – using computers – of course, but our aim is to find an analytical expression for the metric, so that we can study its properties. In order to do this, we can impose the following simplifying conditions to approximate a rather complicated real-world scenario :
- The planet is perfectly spherically symmetric
- The planet is an isolated body in perfect vacuum – all external influences can be neglected
- The planet’s rotation is slow enough so that it can be disregarded
- The planet carries no net electric charge
- The planet’s magnetic field is weak
- The planet’s gravitational effect does not change over time
- Sufficiently far from the planet, its gravity can be adequately described as Newtonian
These boundary conditions are a reasonable approximation for a body such as the Earth, which isn’t too massive, too close to its central star, or rotating too rapidly. Given all of these, we can make the following ansatz for the metric outside the planet, based on a simple polar coordinate system :
with two as yet unknown functions B(r) and A(r). So, by imposing all of the above boundary conditions, we have straight away reduced the problem to finding just two functions, which depend only on the radial coordinate. In the exterior vacuum outside the planet, the EFE (15) then simply becomes
Based on the metric ansatz (17), we can now go ahead and calculate the non-vanishing components of the Ricci tensor, and set them equal to zero, according to (18). This gives us a system of differential equations to find A(r) and B(r). As it turns out, this can be solved in closed analytical form ( see here for details of how to do this ), and the solution is
wherein , with G being the gravitational constant. This is called the exterior Schwarzschild metric; it describes the geometry of space-time outside a massive body that fulfills the seven conditions set out above, and is the simplest possible ( non-trivial ) solution to the Einstein equations. We shall discuss the properties of this solution in detail in a separate article.
Schwarzschild Metric. The exterior Schwarzschild metric (19) describes space-time outside a spherically symmetric, uncharged, non-rotating, static, stationary and isolated body in an otherwise empty region. It is the simplest non-trivial solution to the Einstein equations.
Given (19), we can now go and do everything we have previously discussed – we can calculate the separation between any events in this space-time, we can examine how test particles move, we can define concepts of areas/volumes, and so on. The mathematical discipline of differential geometry provides the tools for us, and all we need to do is plug in the metric we have just found.
Other exact analytical solutions exist to generalise the exterior Schwarzschild metric by relaxing some of the boundary conditions; in particular :
- One can find a matching solution for the interior of the planet – the interior Schwarzschild metric
- It is possible to allow the body to have angular momentum ( Kerr metric ), net electric charge ( Reissner-Nordström metric ), or both ( Kerr-Newman metric )
- One can embed the body into a surrounding radiation field ( Vaidya metric )
- Any combination of the above
Many other solutions have been found that model very different situations, such as
- The gravitational metric of electromagnetic fields and radiation
- Systems of more than one body
- The gravitational collapse of stars
- Cosmological solutions that describe the universe as a whole
- Gravitational wave solutions
- Geon solutions, which model topological structures that are held together purely by their own gravitational self-energy
and others. Not all of these exist as closed analytical solutions ( some can only be modelled numerically with the aid of computers ), but those that do are all found using a common recipe :
- Find a suitable ansatz for the required metric from the boundary conditions of the system in question, as we have done with (17) above
- Calculate the non-vanishing Christoffel symbols from the metric ansatz
- Calculate the non-vanishing components of the Ricci tensor from the Christoffel symbols found above
- For interior solutions, determine the components of the energy-momentum tensor
- Insert everything into (14) or (15) to obtain a system of differential equations for the unknowns; solve if possible
Whether or not an analytic solution is possible will depend on the number and complexity of the resulting system of equations; in general terms, the fewer the symmetries of the problem at hand, the harder it will become to find a solution. If we do manage to find a metric, we can then go on to calculate geodesics in our space-time, as well as how those geodesics deviate. The geodesics themselves are obtained by translating their definition ( world lines of test particles in free fall, i.e. world lines at each point of which proper acceleration is exactly zero ) into mathematical language :
Remember we are now in a curved space-time, so we need to use the second covariant derivative with respect to proper time, or else we get just straight lines, which is physically meaningless. Written out explicitly, this gives us a set of four differential equations for the four components of the geodesic trajectory :
This is called the geodesic equation, the solutions of which are geodesics of space-time.
This concludes our quick overview of the mathematics of GR. Only the most basic of concepts are presented here, so much more is to be said on the subject; I urge you to also read the Manifolds and Curvature article if you haven’t done this yet. With these tools in hand, we can now turn our attention to the more physical aspects of GR as a theory.
As always – stay tuned !
 Misner/Thorne/Wheeler, Gravitation, fig. 1.11
 http://math.ucr.edu/home/baez/einstein/einstein.pdf, eqn (2)