Chapter 4 Missing values

4.1 Colonial History Dataset

From this, it seems that most countries have values for most data. There are a few missing values in ColRulerCode, this is because some countries were colonized by Austria-Hungary, which does not now exist and therefore has no iso3 code.

Per capita GDP has a single missing value for Taiwan, I’m not very sure why this is the case, it just wasn’t part of the UN dataset. This might have to do with the China-Taiwan conflict.

The gini coefficient and human development index adjusted for inequality (IHDI) both have many missing values. That IHDI had more missing values than the human development index (HDI) surprised us, we expected that both HDI and IHDI to have more many, but about the same amount of missing values, and to have more missing values than the gini coefficient. This is because we felt that since HDI requires a combination of information it would be less available. Instead it seems that information regarding inequality is the least available, since this is fairl numeric (involving incomes/wealth) that tend to be reasonably well documented due to taxation, we was surprised that there was so much missing information on the subject.

When graphing in comparison to GDP per capita, HDI, and gini coefficient we chose to deal with missing values by omitting them, since we felt that discussing them extensively here gives an effective sense of what the graphs we provide actually include, and since there are many missing values for some of these indicators including them in the actual graph ends up being distracting without contributing anything. That is since so many countries are involved and NA column would not really help us interpret the data.

As a note, for the sake of brevity, we did not include IHDI in our final results.

4.2 Angus Maddison Dataset

4.2.1 Maddison Dataset

Missing Values in Maddison CGDPpc

Missing Values in Maddison CGDPpc

Missing Values in Maddison CGDPpc after 1800

Missing Values in Maddison CGDPpc after 1800

(The files were saved as images for convenience of viewing and are under the github img/ folder.)

Looking at this data it’s clear that there is very little data available prior to 1800, It only exists for select years ex(500,1000,1500). As in the previous chart we can see that as the period between those years is simply empty space.

In the graph that provides a closer look at country’s per capita GDP’s after 1800s. Countries with more data available than others include: South Africa, Venezuela, United States, Uruguay, Sweden, Soviet Union, Portugual, Netherlands, Norway, Poland, Peru, Sri Lanka, Mexico, Japan, Italy, Indonesia, Greece, UK, France, Denmark, Spain, China, Chile, Brazil, Australia, Argentina

These are some countries that would likely be more interesting to look at in terms of gdp overtime, simply because the data is more available.

4.2.2 England Dataset

Examining the GDP data available for England we can see that, while there are values at 0, 500, and 1000, a great deal of data in that time period doesn’t appear. This information is not colored yellow for missing simply because those years are not even listed in the dataset. Because the rgdp and cgdp columns combine information on England and the UK they have the most filled values, including every year from about 1200 onwards. We use this data to examine the per capita gdp of England overtime

Again when working with the Maddison data we chose to omit missing values from views because we felt that we discussed their impact in detail here, and in the analysis, so including them in the actual graphs would be both redundant and distracting.

4.3 Colonial Transformation Dataset

Upon inspection, the countries which are missing data fall into the following patterns: * Countries that are missing only one value: Egypt, Jordania, Lesotho. Said values are absent from the original csv presented on the University of Zurich website. * Mongolia, who is missing Colonization beginning and end dates, and has all “present” values set to 0. Similarly: * Singapour, Brunei and Hong Kong, who have no encoded values. All 4 countires, while presented for the database, seemed to have been considered by the researchers as out of the scope of their studies and their scores should not be considered.