Data collation is an important foundation of data research, a necessary step to improve the quality and use value of survey data, and an objective requirement for data preservation. The principles of data collation are authenticity, qualification, accuracy, integrity, systematicness, unity, conciseness and novelty.
Sorting and classification:
The data collected by social research institutes can generally be divided into data and text. The former is obtained through structured questionnaires and interviews, which involves a large number of respondents and can be statistically grouped and summarized; The latter are mostly unstructured observation, interview data and literature data, generally a few typical or case data. The sorting process of these two kinds of data is basically the same, but the sorting method is different.
1, text information
In social investigation and research, qualitative data are basically written data, so the collation of written data is generally called qualitative data collation. Due to the differences in the sources of written materials, the sorting methods are slightly different. But it can usually be divided into three basic steps: review, classification and sorting.
In order to review the written materials, it mainly solves the problems of authenticity, accuracy and applicability. For the classification of written materials, it is to classify materials, organize and systematize complicated materials, and provide a basis for finding out the regular relationship. There are two methods of classification, namely pre-classification and post-classification.
For the compilation of written materials, it mainly refers to summarizing and editing classified materials according to the actual requirements of investigation and research, so as to make them systematic and complete materials that can reflect the objective situation of the respondents. For the arrangement of written materials obtained through observation, interview and literature collection, the steps are as follows:
(1) Check the authenticity and reliability of the data, such as whether there is personal bias in the observation records, whether the respondents truthfully reflect the situation, and whether the literature sources are reliable.
② Extract the main contents related to the research purpose from the original materials to simplify the data. These two steps are also called "getting rid of the false and keeping the true, getting rid of the rough and getting the essence".
(3) Organize data according to theme, person or time, and establish data files. Its function is to facilitate search and further qualitative analysis, such as type comparison analysis or time series analysis. You can also convert the contents of written materials into data forms for quantitative content analysis.
Data collation is an important link in the transition from investigation stage to research stage and from perceptual knowledge to rational knowledge. It is also an important step to improve the reliability and validity of investigation and research, which is directly related to the reliability and accuracy of data analysis and research conclusions. Therefore, it is of great significance and function for sociological research to sort out the data scientifically and reasonably.
2. Data
Data is the basis of quantitative analysis in research center, so data collation is also called quantitative data collation. In the stage of data arrangement, in order to draw a correct investigation conclusion, it is necessary to further process the data, and its general procedures include several stages of digital data inspection, grouping, summary and making statistical tables or charts. The test is mainly to test the integrity and correctness of digital data to ensure that the research results are more accurate.
Grouping is to divide the survey data into different components according to certain signs. Summary is to collect the grouped data into relevant tables for calculation and summary according to the purpose of investigation and research, so as to reflect the quantitative characteristics of the respondents in a centralized and systematic way.
Data summary can be divided into manual summary and mechanical summary. After digital data is summarized, it is generally through tables or graphs, and the most common ways are statistical tables and charts.
Extended data:
Finishing reason:
Generally speaking, there are several reasons why the current materials are not easy to spread: the language of the materials is not easy to be understood by Chinese; The data is only the picture data obtained by the scanner, not the text format that is easier to spread; Because of the age, the layout of the book is not suitable for the current reading habits, or the text is traditional Chinese characters; Or the book itself has neither pictures nor written materials.
Finishing method:
Therefore, there are several aspects in data collation: (1) translating valuable foreign materials into Chinese; In order to reduce the workload, books with clear pictures can be converted into words by recognition software first, and then proofread the words according to the original pictures.
If the font is traditional, the picture information of the book is unclear and the typesetting method is outdated, then you need to input the text directly according to the picture information; Scan and input books and materials that are not available on the internet now, and form electronic versions for dissemination; Further proofreading input or proofreading text; Organize typed and proofread texts into perfect chm or other e-book formats.