Compilation
of data or Classification
of data |
Classification of data,definition, methods of classification,class
limit,class interval, exclusive and inclusive classes, class
frequency,cumulative frequency |
MEANING OF CLASSIFICATION OR COMPILATION
OF DATA.
The process of arranging data in groups or classes according to
resemblances and similarities is technically called classification. Thus, by classification we
try to strike a note of homogeneity in the heterogeneous elements of the collected infomation.
Classification gives expression to the similarities which may be found in the
diversity of individual units. In classification of data, units having a common characteristic are placed
in one class and, in this fashion, the whole data are divided into a
number of classes. Even after classification, the statistical data are not fit for comparison
and interpretation and need
proper tabulation. After tabulation of data, statistical analysis and
interpretation are possible. Classification is a preliminary to tabulation and
it prepares the ground for proper presentation of statistical facts.
The collected data is often uninteresting. The unorganised and shapeless data can
neither be easily compared nor interpreted.Therefore after collection of the
data the first step is to present the raw data in some orderly and logical form so that their essential features may become
explicit. The technique of
arranging the data in different hemogeneous groups is called classification of the data. Thus a
lot of heterogeneous data is subdivided into different groups on the basis of
some ‘ common features
and after being classified, the data can be tabulated for purposes of interpretation. In
other words,classification is the first step to tabulation.
According to H. Secrist, “Classification is the
process of arranging data into sequences and groups according to their common
characteristics or separating them into different but related parts. ”
Main Objects of Classification.
1. Presenting Statistics in
a simple Form. The major function of classification is to remove the complexity
of Statistics and present them in a simple form. As we know, collected
Statistics are very complex
and common man cannot understand them and absorb them. By classification
Statistics are made simple
and as a result it becomes easy to understand them and absorb them.
2. Depicting Homogeneity and
Heterogeneity of Data. It reveals clearly the points of
homogeneity and heterogeneity in the statistical data pass students on one side
and fails students‘on the
other side.
3.Easy to understand.Classification
eliminates unnecessary
details so that the salient features of the collected data are more
readily understandable Classification minimises mental strain.
4. Helpful in Comparison. It helps us in comparision. If the data are unorganised and dissimilar, it cannot be
compared . So for proper comparison classification of the data is needed.
5 Increase in usefulness. Classitication helps a person in forming a mental picture of the
phenomenon to which the given data relate. Classified data can be easily understood by an illiterate person.
6. Scientific Management. Scientific
managment of the data is possible if the data are very much classified.
3: Attractive and Effective. Classilication helps in making the data very attach and effective.
4.Basis of tabulation. Classification
is the basis of tabulation. So tabulation will be proper if the classification
is proper.
4.Presenting Statistics in a Condensed Form. Classification helps in condensation of data. After collection of data in an enquiry
we obtain large number of figures. It is necessary to give these figures a
condensed form. This thing is done by
classification.
6. Helpluf in Presentation of Statistics. Statistics are presented
with tables, graphs or diagrams .This presentation of data requires
classification
7. Helpful in Analysis of Statistics. Analysis of data is the main part of an enquiry.'Data cannot be
analysed without classification. For applying any technique of statistical
analysis, classification of data is must.
8. For Finding Unnecessary
Statistics. While collecting data for an enquiry
many unnecessary figures are collected. For judging these unnecessary
Statistics classification renders much
help. Therefore classification is very
useful in Statistics.
TYPES OF CLASSIFICATION or METHODS OF CLASSIFICATION.
Statistical data are classified according to the characteristics
possessed by them. These common characteristics reveal the homogeneity of a group of units in
the whole lot of heterogeneous data. These characteristics can either be
descriptive or numerical. Descriptive characteristics cannot be quantitatively
measured, only their presence or absence in an individual unit can be found,
e.g. sex, nationality, literacy etc. cannot be quantitatively expressed. What
we can do is to determine whether an individual is literate or illiterate.
employed or unemployed. Numerical characteristics, on the other hand, are
capable of quantitative measurement. e.g. height, weight etc. When the data are
classified on the basis of descriptive characteristics which cannot be expressed
quantitatively. the classification is said to be according to the
attributes and when the data are classified on the basis of quantitative measurement
the classification is said to be according to the class intervals on the basis
of attributes and class intervals the classification is of different types.
Thus,
broadly speaking, data can be classified on the following four bases :
1.Spatial or Geographical, i.e., in
relation to place
2. Chronological, i.e., on the basis
of time
3. Qualitative, i.e., according to
some attributes
4. Quantitative, i.e., in terms of
magnitude.
1.
Geographical Classification : ln
geographical classification, data are classified on the basis of place. If, for example. we
write down the population of the Indian Union on the basis of various States or
if we write down the number of students in different universities of the country, or production
of wheat in different geographical
areas of the country, the series that we would get would need
classification on the basis of geographical distribution. Series which are arranged on the basis of place
are called spatial series.
It is a
classification based on geographical regions. If the existing political
boundaries are taken as the basis, the classification may be done by states and districts. e. g.
The
following is an example of a geographical distribution :
Country |
National income in U. S.
dollars |
Canada |
7930 |
USA |
7880 |
West
Germany |
7510 |
France |
6730 |
U.
K. |
4180 |
U.
S. S. R. |
2800 |
India |
140 |
Geographical classifications are
generally listed in alphabetical order or listed by the frequency size to emphasise
the importance of various geographical regions .
2.
Chronological classification : When the data are classified on the basis of time, then it is known as chronological classification. Such series are
also known as time series because one of the variables in them is ‘time. 1f the
population of India during the last eight censuses is classified, it will result in a time series
or chronological classification.In such a classification , data are classified
either in ascending or in descending order with reference to time such as
years, quarters,months ,weeks etc. It is
also called temporial classification.
The following table would give an idea
of chronological classification :
year |
Production( soap) company X |
1995 |
12800 |
1996 |
13988 |
1997 |
14288 |
1998 |
15779 |
1999 |
16827 |
2000 |
16989 |
2001 |
17828 |
Qualitative classification;
Classification
according to Attributes. When the data are classified on the basis of presence or absence of
some attribute, which is incapable of quantitative measurement, it is a
descriptive classification or a classification by attributes. Descriptive
classification is of two types:
(a)
Simple classification. In
this method the entire statistical data are divided on the basis of presence or absence of a
particulat attribute. All those units which possess a particular attribute are
put in one group and the other with the absence of such an attribute are placed
in another group, e.g. literacy and illiteracy, male and female etc. e,g
Male not male (female)
(b) Manifold Classification.
In this case we study more than one attribute simultaneously, the
statistical data will be divided into more than two classes, e, g on the basis
of language, we classify Hindi language, Gujarati language, Bengali language
etc.
LITERATE ILLITERATE
LITERATE ILLITERATE LITERATE ILLITERATE LITERATE
ILLITERA
3.
Quantitative, i.e., in terms of magnitude.
CIassification
according to the class-intervals. The difference between the upper limit
and the lower limit is known as class interval, or quantitative classification . It is a
classification which is based on such
characteristics which are capable of quantative
measurement. Such characteristics can be height,
weight,income,expenditure,number of marks obtained by students of a class etc':
e,g
Continuous series
Marks(class interval) |
No. of students (frequency) |
0-20 |
5 |
20-40 |
15 |
40-60 |
25 |
60-80 |
10 |
80-100 |
5 |
TOTAL |
60 |
Discrete series
Marks |
No. of
students |
2 |
5 |
3 |
15 |
4 |
25 |
5 |
10 |
6 |
5 |
TOTAL |
60 |
In this type of classification following terms
are used: '
(a) Class limit. The
class limits are the lowest and the highest values that can be included in the class. The two boundaries
of a class are known as the lower limit and the upper limit of the class, e. g.
in the class 10-20, 10 is the lower limit and 20 is the upper limit.
(b) Magnitude of
class-interval. The difference between upper limit and lower limit called the
magnitude of class interval.
(c)
MId-Value. It is the value lying haIf-way between
the lower and upper cIass limits of a cIass-interval. Mid-point of a class is
ascertained as follows:
Upper limit+ lower limit
2
Exclusive method;
Marks( class interval) |
No. of students
(frequency) |
0-20 |
5 |
20-40 |
15 |
40-60 |
25 |
60-80 |
10 |
80-100 |
5 |
TOTAL |
60 |
Inclusive method;
Marks ( class interval) |
No. of students (frequency) |
10-19 |
5 |
20-39 |
15 |
40-59 |
25 |
60-79 |
10 |
80-99 |
5 |
TOTAL |
60 |
Marks ( class interval) |
No. of students (frequency) |
9.5-19.5 |
5 |
19.5-39.5 |
15 |
39.5-59.5 |
25 |
59.5-79.5 |
10 |
79.5-99.5 |
5 |
TOTAL |
60 |
Cumulative frequency;
Marks |
No. of students |
Cumulative
frequency |
2 |
5 |
5 |
3 |
15 |
20 |
4 |
25 |
45 |
5 |
10 |
55 |
6 |
5 |
60 |
TOTAL |
60 |
|