The following graphs provides basic information on the press corpus we built for analyzing the presence of the Rotary Club of Shanghai in the local press. The tabulated data we used for building the graphs is available in the "Tables" section. The first bar chart summarizes the most important information related to the corpus (documents and distribution across newspapers over time). The two next graphs show the number of occurrences (words or bags of words) and documents (articles) that mention the Rotary Club of Shanghai in the press, while the remaining ones examine in more detail the distribution of ocurrences and documents (articles) across the three major English-language newspapers.
Built by combining two sets of keywords ("Shanghai Rotary Club" and "The Rotary Club of Shanghai"), this corpus contains all the press articles that mention the Rotary Club of Shanghai in three major English-language newspapers: North-China Herald, Millard's Review (and its successor China Weekly/Monthly Review), and China Press. We selected these newspapers because they are the ones that best publicized the activities of the club and enjoyed an appreciable continuity of publication. We also selected them because they are easily accessible in digital format on the ProQuest collection of Chinese newspapers, which makes them processable for large-scale analyses. For building the corpus, we used the R Studio package “enpchina” developed by Pierre Magistry, our computational linguist in the European Research Council (ERC) Project “Elites, networks and power modern China". This package was specifically tailored for the ProQuest collection of Chinese newspapers. Drawing on Natural Language Processing (NLP) techniques, this package offers a set of advanced tools for exploring large corpora of historical newspapers, especially complex queries, statistical analysis, extraction of name entities, network analysis and graph visualization. The keywords were carefully selected so as to reduce noise deriving from the fuzzy meaning of the word “rotary” (which literally means “wheel” or “mill”) and to remove articles dealing with Rotary clubs outside of Shanghai.
Based on this query, we obtained a corpus of 949 occurrences distributed across 865 documents (articles) and six different publications (mostly three: China Press, China Weekly Review, North-China Herald), ranging from 1919 to 1948, with a peak in 1936.
|