I want to work on the emerging Artificial Intelligence field. I am excited about the possibilities of automating the grunt work in our daily lives and that is what drives me to learn this.
One of the important subject to learn and master is statistics and probability. For that I'm learning Statistics and Probability from Khan Academy. It is an excellent course and you should go do it. This is just a summary of concepts for quick reference.
Individuals, variables, categorical and quantitative variables.
Consider the following dataset given below.
In this dataset, Alek is taking inventory of styles. So they are the individuals in the dataset. For each style, the variables are width, total length and color. Of these, color is called a categorical variable as it can take value from certain categories. The width and total length are quantitative or numerical variables as they can take any value in a continuous range.
So in the dataset, styles are the individuals. There are three variables of which on is categorical and other two are numerical.
Also they can be used to display the frequency of a category.
A pictograph showing number of sheep for various people. The same can be shown in a bar graph with a bar showing the height.
If there are two variables and we have data related to the two categorical variables, they can be shown in a two way frequency table or venn diagrams.
This can be shown in a Venn diagram also.
The last row and columns are called marginal distributions as we write in the margins. The last column distribution is the marginal distribution of trains runs on different weather conditions.
The last row is a marginal distribution of trains being on time or delayed.
On time row is a conditional distribution of weather conditions for on-time trains.
One of the important subject to learn and master is statistics and probability. For that I'm learning Statistics and Probability from Khan Academy. It is an excellent course and you should go do it. This is just a summary of concepts for quick reference.
Analyzing categorical data:
Individuals, variables, categorical and quantitative variables.
Consider the following dataset given below.
Alek is taking an inventory of styles of compression bandages for work. Here is the data he has collected.
Style ID | Width (inches) | Total length (yards) | Color |
---|---|---|---|
001 | 1 | 20 | tan |
002 | 1 | 20 | brown |
003 | 1 | 10 | red |
004 | 1 | 15 | blue |
005 | 2 | 35 | tan |
006 | 2 | 20 | brown |
So in the dataset, styles are the individuals. There are three variables of which on is categorical and other two are numerical.
Reading bar graphs, pictographs
bar graphs can be used to display numeric variables in a category in bars with the height of the bar displaying the value of the numeric values.Also they can be used to display the frequency of a category.
A pictograph showing number of sheep for various people. The same can be shown in a bar graph with a bar showing the height.
Two way tables and Venn diagrams
If there are two variables and we have data related to the two categorical variables, they can be shown in a two way frequency table or venn diagrams.
Preference | Male | Female |
---|---|---|
Prefers dogs | 36 | 22 |
Prefers cats | 8 | 26 |
No preference | 2 | 6 |
This can be shown in a Venn diagram also.
Marginal and conditional distributions
Weather condition | On time | Delayed | Total |
---|---|---|---|
Sunny | 167 | 3 | 170 |
Cloudy | 115 | 5 | 120 |
Rainy | 40 | 15 | 55 |
Snowy | 8 | 12 | 20 |
Total | 330 | 35 | 365 |
The last row and columns are called marginal distributions as we write in the margins. The last column distribution is the marginal distribution of trains runs on different weather conditions.
The last row is a marginal distribution of trains being on time or delayed.
On time row is a conditional distribution of weather conditions for on-time trains.
No comments:
Post a Comment
Comments