Data Analysis Essay Sample

Data Analysis Essay Sample.

Data Analysis Essay Sample

Week 4 Individual Assignment: T-test

Problem Statement

This research area depends on the assertion that total of unemployment, which depicts the total labor force calculated in the form of percentages, relies on the literacy level of adults predetermined among a population of 15+ in form of percentage. In this case, the rate of unemployment is regarded as a dependent variable Y calculated in terms of percentage. As a result, unemployment (Y) variable relies on the independent variable X, which represents Literacy Rate calculated in form of percentages.

Study Objectives 

This research primarily aims to establish whether there is any association between unemployment and the rate of adult literacy.   

Research Question

Is there any substantial linear relationship between unemployment and the rate of adult literacy?

It is important to lay a great focus on null and alternative hypothesis below to understand whether there is any form of association between the two variables. 

H0:  No significant correlation exists between unemployment and the rate of adult literacy.  

H1:  There is a significant correlation between unemployment and rate of adult literacy.


This research will use 0.05 as its alpha level of significance.    

Definition of the Variables used in the Study

In this study, the level of Unemployment, which is represented by (Y) denotes the percentage proportion of the people without work but available and willing to work. Unemployment is, therefore, calculated and measured using the ratio scale as it is in terms of percentages.   

Different organizations have conducted adequate research regarding issues of unemployment. According to UNESCO, literacy rate refers to the total proportion of the people in a given population aged 15 years and above with the ability to read and write a well as the capacity to comprehend a short simple sentence or statement on their daily life. In overall, literacy (X) also includes ‘numeracy’, which refers to the capability of a given person to make simple arithmetic calculations.

For instance, adult illiteracy refers to the proportion of the population aged 15 years and above who are unable to read and write. Such people also can’t understand a short and clear statement in their daily life. The ratio is the most appropriate scale used to measure and calculate literacy rate as it is always in the form of a percentage. Studies estimate that an increase in the level of literacy has the effect of reducing unemployment.

                                                            LITERATURE REVIEW  

Background of the Study

There is a plethora of factors that affect unemployment in relation to the inception of the contemporary market economy. For instance, unemployment is defined as the percentage proportion of the labour force that has no work but is ready and willing to work. As a result people belonging in the group of the unemployment are always in search for employment.  This research area depends on the premise that unemployment, which is the cumulative sum of the labour force in terms of percentage, relies on the rate of adult literacy, which refers to the population of people aged 15 years and above.

                                                RESEARCH METHODOLOGY

Data Collection

The research collected data regarding the specific values of unemployment (Y) from secondary sources such as the world development pointers such as the World Bank data. Conversely, the research relied on UNESCO as the main source of the observations gathered regarding levels of literacy rate (X3). The analysis of the data relied on a keen evaluation of 30 specific countries. Since all the values used in the analysis corresponded to the same year, the data was regarded as a cross sectional data

Primary Data Gathered from Word Bank and UNESCO

Descriptive Statistics

The illustration below represents the descriptive statistics for the rate of unemployment and the literacy rate:

Below is the graphical depiction of both unemployment and literacy rates of the above data. 

The Boxplot below acts as an Indicator of Symmetry. As such, the box plot for the unemployment and the literacy level variables is provided in the illustration below.

The Goodness of fit test for unemployment is given below.

The Goodness of fit test for Literacy is given below.

                                                            DATA ANALYSIS

An Overview of the Descriptive Statistics

Mean refers to the cumulative total of all the observations gathered during the study and divided by the total number of observations. In this case, the man, which is also the average level of unemployment stood at 12.22 showing that the average percentage of unemployment is 12.22%. On the other hand, the mean for literacy level stood at 89.45.

Conversely, median refers to a value that divides a given set of data into two equal halves when arranged in an ascending order. In this scenario, the median mark of unemployment is 10.40. This figure shows that the rate of unemployment of the half of the total countries under investigation exceeded the 10.4 mark while that of the remaining half stood below 10.4. The median mark of the literacy level in this case was 94.37 showing that the literacy level of half of the countries under study exceeds 94.37 while the remaining half is below 94.37.

Mode is defined as the number that occurs frequently in a given data set. In this case, the unemployment mode is 7.67. This assertion indicates that 7.67 is the maximum unemployment rate of every country under the investigation. 

On the other hand, frequency plays an essential role by providing the total amount of observation considered in a given study. In this case, the frequency level stands at 18. As a result it can be concluded that the rate of unemployment and that of literacy are considered to be 18 countries.

Variance and the standard deviation are widely known as the measures of dispersion and distribution of a given data set about the mean. The standard deviation is used to measure the level of concentration of a given data set around the mean. Standard deviation becomes smaller the more the data becomes highly concentrated around the mean.  In this case, the standard deviation has a low value implying that this data set has a highly reliable mean.

Skewness refers to the lack of symmetry. The study of skewness usually targets to evaluate the shape of a given curve drawn after analyzing a given set of data.  In this case, the data for unemployment is positively skewed showing that the number of countries experiencing high unemployment levels is very few. However, the data set for the level of literacy is negatively skewed showing that the number of countries with low literacy rate is few. 

Kurtosis helps to ascertain whether the shape of the distributional data conforms to the Gaussian distribution. For instance, a flatter distribution tends to have a negative kurtosis.

Graphical Representation

Analysis of the data collected in the study by the use of a histogram shows that it is positively skewed while that of the rate of literacy negatively skewed.

Nevertheless, the same data appears to be positively skewed when determining the rate of unemployment while that of the rate of literacy is negatively skewed.  As a result, the unemployment data set lacks outliers as opposed in the literacy rates, where one outlier is detected.   

An Overview of the Goodness of fit test

In the case of the Null hypothesis, it is evident that the unemployment rates conform to the underlying premises of normal distribution. For instance, Yi follows N (12.22, 1.60^2)

However, the unemployment rates (h1) do not follow normal distribution in the case of the alternative hypothesis. This can be summarized with the assertion that Yi doesn’t conform to  N (12.22, 1.60^2)

As a result, the researcher has to reject h0 which stands at 5 percent significance level since the p-value is below alpha (0.05). Hence, the researcher concludes that the unemployment rates do not conform to normal distribution, summarized as Yi doesn’t adhere to N (12.22, 1.60^2).

On the other hand the null hypothesis displays a scenario where the rates of literacy  conform to the stipulations of the normal distribution, which is shortly presented as Yi follows N (89.45, 3.11^2)

In the alternative hypothesis depicted with h1, the level of literacy does not follow normal distribution. This relationship can be summarized as Yi doesn’t follows N (89.45, 3.11^2)

p-value =0.0000

It is therefore, noteworthy to reject h0 which stands at 5% significance level with the p-value standing below (0.05).  As such, the researcher can conclude that the level of literacy don’t conform to the normal distribution, which is summarized as Yi doesn’t follows N (89.45, 3.11^2).


  • The mean for unemployment stood at 12.22 while that of the literacy level is 89.45.  This assertion implies that an average rate of unemployment stood at 12.22% when calculated and presented in the form of percentages.
  • Conversely, the median level of unemployment reached 10.40. This finding indicates that the rate of unemployment of half the countries that took part in the study exceeds 10.4 while that of the remaining half was below 10.4.
  • The median for the level of literacy reached 94.37. This figure shows that the literacy level of a half of the countries exceeds 94.37 while that of the remaining half was below 94.37.
  • An evaluation of these findings shows that this data is positively skewed for the level of unemployment. These findings indicate that the number of countries with high rates of unemployment is very small. However, the data for the rate of literacy is negatively skewed. This assertion shows that there are a limited number of countries experiencing   low literacy rate.
  • It is evident that both the rate of unemployment and that of literacy do not conform to normal distribution.



bank, t. w. (n.d.). Unemployment, total (% of total labor force) (modeled ILO estimate). Retrieved from

– See more at:

unesco (n.d.). Education: literacy rate. Retrieved from

– See more at:

Berenson, M., Levine, D., Szabat, K. A., & Krehbiel, T. C. (2012). Basic business statistics: Concepts and applications. Pearson Higher Education AU.

Croxton, F. E., & Cowden, D. J. (1939). Applied general statistics.

Heiman, G. (2015). Behavioral sciences STAT 2 (2nd ed). Stamford, CT: Cengage.

Browse more products here

Order Here

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.