July 12, 2021

Phonological Representation Database for Chinese Characters: Database Instructions

How to use the database

The database is constructed in order to provide an easily accessible interface of Mandarin Chinese characters' phonological representation for psycholinguists and connectionnists. It can be used as the phonological input representation in modeling studies about Mandarin language processing and acquisition. The interface is self-explanatory and easy to use. Through answering a few simple questions, researchers can get the Mandarin phonological representations fit to the need of their own research.

1. Feature Codes

There are two types of representations to fit varing needs of different computational models. One is coded by real valuse between 0.0 and 1.0, and the other one is coded by binary numbers (0 or 1). Users can select the one they want by clicking the corresponding checkbox. The default one is real code.

2. Tones

Mandarin includes 5 tones: 0 (netural), 1 (Flat), 2(Rising), 3 (Falling-Rising) and 4 (Falling). These tones are important features of Chinese phonology but might not be the research goal of certain researchers. Here users can chose if they want the phonological representation with tones information or not.

3. Justification of Phonemes in the Template

Similar to PatPhon system (Li & MacWhinney, 2002), the Mandarin sounds are represented based on a template (CVVVC) suited for Mandarin phonological features. In the template, the phonemes can be arranged starting from the leftmost slot (Left-justified) or the rightmost slot (Right-justified). For example, the sound 'can' can be represented in the CVVVC template as [ca--n] (Left-justified) or [c--an] (Right-justified). Basically, the left-justified representations place emphasis on phonological similarities at the begining of the sounds, while the right-justified representations place emphsis on the coda of the sounds.

4. IPA (International Phonetic Alphabet)

The IPA symbol representing the pronounication of every syllable can be shown in the results by chosing this option. The IPA symbols, along with the example characters, can give researchers a guide to pronounce the Mandarin sounds.

5. ASCII Symbols in CVVVC template

Similar to PatPhon, the phonemes of a syllable are represented in a CVVVC template. Here the phonemes are represented by single-letter ASCII symbols instead of IPA symbols to make the syllable structure clear and save space at the same time. The empty slots in the template are marked by '_'. Please notice that the selection of different justifications in question 3 causes different representations of the same sound here.

6. Example Characters

For users who are not familiar with PinYin system, we also provide example characters for each syllable (without tones) in our database.

7. Specified Search

You can narrow down your search by typing Pinyin symbols (without tone) into the inputfield beside 'Pinyin' (e.g. an, wang, etc). Also, with the selection of 'with tones' in question 2, a new input field of 'Tone' emerges, you can type the tone (0,1,2,3, or 4) of the syllable here. If you leave the 'Tone' textbox empty, all the five tones (if applicaple) of the same Pinyin will be shown in the results,

You can chose a fuzzy query or an exact-match query. The fuzzy query is extremely useful if you want to find all the possible sounds which include a certain combination of phonemes (e.g. all the sounds including 'ang'). If you chose this option , and kepp the 'Pinyin' field empty, all the possible sounds in the database will be shown . The exact-match query is useful if you want to find the record with an exact sound as you type in the input fields.

8. Show and Save Query Results.

After answer the seven questions, you can click the 'show me the results!' button. A new page will pop up with the numerical representations and the other features of the sound you chose. If your web browser is outdated, it may not show the Chinese characters and IPA symbols correctly. You can solve this problem by encoding your results page with Unicode (utf8) format.

The quickest way for you to save your data (if using a PC) is by pressing Ctrl+S (or File > Save Page As) to download the contents of the page. Before saving the data, be sure to remove the ".htm" extension from the file name and change the file type to "Text". This will save your data as a ".txt" file allowing you to open it in a text editor (e.g. notepad).

If this method does not work (i.e. you are using a Mac, 'S' key is broken, etc.), you can highlight the contents of the table, copy them, and paste them into a text file or a spreadsheet.

Because the data on the webpage are arranged in a table, when you save the data as a text file they will automatically be separated by one tab-space. If you then choose to import your text file into a spreadsheet (e.g. Microsoft Excel), the data will neatly fit into the cells; thus the table layout is preserved.

Note: Since the Chinese characters and the IPA symbols are coded by Unicode (utf8) format, they may not be correctly shown in a pure text editor. You can solve this problem by changing the font of the text into any type supporting Unicode format (e.g. 'Arial Unicode MS').