170 documents
165/170 results        
TitleThe Rotary Club of Shanghai in the press
Year Start1919
Year End1948
DateThursday 11 June 2020

The attached tables contain the data related to the press corpus we built for analyzing the presence of the Rotary Club of Shanghai in the local press. Built by combining two sets of keywords ("Shanghai Rotary Club" and "The Rotary Club of Shanghai"), this corpus contains all the press articles that mention the Rotary Club of Shanghai in three major English-language newspapers: North-China Herald, Millard's Review (and its successor China Weekly/Monthly Review), and China Press. We selected these newspapers because they are the ones that best publicized the activities of the club and enjoyed an appreciable continuity of publication. We also selected them because they are easily accessible in digital format on the ProQuest collection of Chinese newspapers, which makes them processable for large-scale analyses.

For this research, we used the R Studio package “enpchina” developed by Pierre Magistry, our computational linguist in the European Research Council (ERC) Project “Elites, networks and power modern China". This package was specifically tailored for the ProQuest collection of Chinese newspapers. Drawing on Natural Language Processing (NLP) techniques, this package offers a set of advanced tools for exploring large corpora of historical newspapers, especially complex queries, statistical analysis, extraction of name entities, network analysis and graph visualization. We used more specifically three major functions developed as part of the package (the R Script is provided below): 

(1) the function search_documents() was used for building the corpus and obtaining the attached table of documents (articles) (table 1)

Based on this query, we obtained a corpus of 949 occurrences distributed across 865 documents (articles) and six different publications (mostly three: China Press, China Weekly Review, North-China Herald), ranging from 1919 to 1948, with a peak in 1936. The results of the query are contained in the first table. 

(2) the function search_concordance() was used to obtain the attached concordance tables that contains all occurrences of the Rotary Club in their original context (sentence) (table 2) 

(3) the functions run_ner() and ner_on_corpus() was used to extract name entities (especially persons) from the corpus (table 3). This table also includes statistical analyses on documents and persons. 

KeywordsRotary club ; Shanghai ; newspapers ; meeting ; NLP ; NER ; corpus ;

rotary_doc.xlsx (87.06 ko)

rotary_conc.xlsx (70.54 ko)

rotary_ner_new.xlsx (27.11 Mo)

rotary_chap.R (3.65 ko)

165/170 results        
Last update: Wednesday 8 May 2024 (14:18) +
Page rendering in 0.023s