July 07, 2012

Kending 墾丁and the National museum of marine biology 國立海洋生物博物館

I've been in Taiwan for almost three years now and although I've tried to be more active in my local travels lately, I've never made it to the famous beaches in the south of the island which are scattered around the small town called Kending (墾丁). In general, beaches in Taiwan are not what you would expect from a tropical island since Taiwan is an island of volcanic origin and the only  postcard-type beaches can be found in the south. 


I left Taipei on a sunny and humid Sunday morning. It wasn't 9 am yet and the temperature was already at 35 degrees. I took the High speed rail at the Taipei main station and after about 90 minutes arrived in Gaoxiong (高雄), which is the second largest city in Taiwan. From there I took a bus called the "Kending express" which after about an hour of a pretty sound ride arrived at the National museum of marine biology (國立海洋生物博物館), which is roughly 20 km away from Kending. 

June 10, 2012

Hiking in Taiwan - Yangming Mountain biathlon

The Yangming Mountain National park (陽明山國家公園) is a wonderful place to go hiking with several hot springs, waterfalls, great views and most of all fresh air and the whole park is only a half an hour bus drive away from Taipei.

I went to the Yangming mountain (陽明山) several times but always took a bus from the ShiLin MRT station (士林捷運站) to the Yangming mountain tourist center and started the hike at the MiaoPu entrance (苗圃). I bought a bike last month and kept wondering whether there were any bike routes that would lead to the top of the mountain. As it turned out, there were none, because the last 2 km of paths are too steep to bike, but there are plenty of ways to get quite close. I searched the internet for some options and eventually decided to bike to LengShuiKeng (冷水坑), which is only 2.1 km away from the top of the mountain. 
牛奶湖 (Milk lake) at the Lengshuikeng tourist center

March 19, 2012

Amount of characters and words necessary to read news articles

Abstract

Hello everyone and welcome to my never ending study again. In the last two posts I was trying to count the number of unique Chinese characters and words in Taiwanese news by analyzing 80 news articles from Taiwan over the period of six weeks. In my study I found there there was a total of 2105 unique characters and 5901 unique words in the 80 articles I analyzed which were separated into four sections: 國際 (international), 政治 (domestic politics), 社會 (society) and 財經 (economics), but as I said, 80 articles was not enough and I tried to extend the study. Using the sampled data I did some calculations and tired to predict what the number of unique characters and words in any given number of articles would be. I found that there would be a total of 2174 unique characters and 8424 unique words and a person would thus need to know this many characters and words to recognize 100% of any given number of news articles, if these news articles were from the same 4 news sections I analyzed.

Introduction

The main task was to predict what the evolution of the unique character and word charts would be and at what point on the y-axis they'd stop ascending. The corresponding x-axis value to that point would be the total amount of characters necessary for a person to know in order to recognize 100% of a random news article as long as it would be from one of the 4 sampled news sections. As you can see by looking at the following two charts, both of them have ascending trends with the Word knowledge chart having a sharply ascending ending with seemingly no approximation to any number.


March 06, 2012

Chinese word frequency list - News

In the last post I analyzed 80 news articles from Taiwan over the period of 6 weeks, provided some basic statistics and tried to come up with a Chinese characters frequency list, by counting the occurrence of unique characters in these articles. In this post I would like to write about the word frequency analysis of these articles. 

Research method

I again analyzed the same 80 articles which were divided into 4 areas: 國際 (international), 政治 (domestic politics), 社會 (society) and 財經 (economics) with 20 articles in each area.

During the whole word frequency analysis process, the biggest problem was to actually separate Mandarin words from each other. Like I mentioned in the previous post, as most of those studying or speaking Chinese know, words are not separated by spaces in Chinese. Counting the occurrence of unique words as opposed to counting the occurrence of unique characters therefore requires much more work, because unless you want to count word frequency with a pen and paper and would like to use a computer program to do the work for you, there has to be something that separates words from one another, in order for the program to know what to count. There are fairly complicated computer programs that can do this sort of indexation for Mandarin automatically, but since I didn't have any of those, I had to do indexation manually.

In order to count the occurrence of unique words in an English article for instance, the process would be much easier, because spaces between words in English texts mark very clearly where a word starts and where a word ends and a computer program can thus use these spaces as index markers to count words and consider everything in between those spaces to be separate word units. In Mandarin this is unfortunately not possible.

Take the following sentence for example:

February 21, 2012

Chinese character frequency list - News articles

I think a lot of those studying Mandarin Chinese have sooner or later started to wonder how many characters one really needs in order to normally function in a Chinese-only world or what for instance the most frequent 500 Chinese characters are. I personally have heard a lot of numbers and saw several Chinese character frequency lists, but often didn't understand why this or that character made it to the top 500 or why the list said I needed this or that number of characters to read something when I had the feeling the number was either overstated or understated so I decided to try to do a little study on my own. 

I tried to analyze how many characters and words are approximately necessary to read news in Mandarin. I chose four sections of Taiwanese news - politics, international, society and finance, all written in traditional Chinese characters during a 6 week sample period.

If I'm correct, the field of computational linguistics deals with projects of this kind and I'm sure that there are several teams of experts at linguistic departments worldwide that must have done similar researches using much more sophisticated methods than I have and after the amount of effort it took me to analyze these few articles, I have a lot of respect for what they do. 

January 06, 2012

Efficiency of Chinese characters

Efficiency of Chinese characters
By Vladimir Skultety, M.A., B.A.

A lot of people say that Chinese characters are inefficient, because they are too complicated and there is too may of them. By contrast they say that western alphabetic scripts are much easier to learn, much easier to write and are thus much more efficient.

In this article, I tried to somewhat objectively analyze the situation, which was a bit hard, because I like Chinese characters a lot, but either way I looked at it, I still think that characters are at least as efficient and in some cases even much more efficient than western alphabetic scripts. 

Negatives:
  • There’s a lot of them. I don’t like numbers but it is true, that you need to know at least 2500 – 3000 characters to read something.  (Edit 5.5.2012 - strangely enough, after my study I found that you would actually only need about 2180 characters to read the newspaper)
  • It’s much more difficult to remember characters compared to the simple 35 or so letters of an alphabetic script
  • They are easy to forget
  • They are easy to confuse
  • You not only need to learn how to recognize them, you need to learn how to write them by hand which doubles your effort
  • They are unpractical when you need to look up something in a list (dictionary, telephone list)
Positives:

December 11, 2011

矛楯 - Lances and shields

Hello everyone,

it's been a while since I made some Classical Chinese text analyses and with nothing to do on this misty Sunday afternoon I thought I'd write a short one just to practice a little. Although it might not look like this, thanks to Google there are some people who find their way to my blog because they are looking for translations of sentences or expressions in Classical Chinese so this post is mostly for those who are already interested in Classical Chinese for this reason or another, or for anyone who might fall in love with it like the rest of us have.

I say this all the time, but I am not an expert on Classical Chinese, I merely have a Bachelor’s degree in Chinese studies. Most of these analyses are based on our classes at the Chinese department and whenever I run into something for what I don’t remember the explanation for, I try to translate and explain it based on what I remember about Classical Chinese grammar and sentence structure, which might not always be correct and I apologize for any mistakes in advance.

November 14, 2011

Remembering Farsi

After almost a year I have finally picked up my studies of Farsi where I left them. This amazing language has been laying dormant on my wish list for a long time, I started learning it twice already and twice have I failed to carry on. The fact, that there are or were no Farsi speakers around was one of the reasons for my pause, but not a very good excuse for it to have taken such a long time. 

Either way it is or was, not having anyone around to practice the language at the beginning is not a very good motivating factor. I know now, how some people really have no choice but to rely on course books and I express my deep respect to those who live in places with little chances of meeting speakers of their target language and can mostly rely only on course books, make the most of their studies and really learn a foreign language to fluency this way.

Farsi really is a wonderful language and the sheer thought that I could freely converse or read in it one day is very exciting so one month ago, I have decided, that I just have to force myself into my studies and be persistent. I wanted to write a short blog entry about where I stand and what ideas I have about the language now. 

October 19, 2011

Mandarin Chinese tones – sound only approach

Mandarin Chinese tones – sound only approach
By Vladimir Skultety M.A., B.A.

I would try to talk about and build on a concept I wrote about in my earlier posts – to try to develop a system, in which students would remember Mandarin words without consciously knowing what tones or tonal combinations are in them and pronounce them correctly using less effort.

As the topic is quite complex, I would first like to go back to 2 earlier articles I wrote about tones and develop the thought from there.

Post from 11.30.2011 (edited):

When I first came to Taiwan, I remember being tired after even a 10-15 minute Mandarin conversation. I was unable to use the words I had learned before effortlessly even after I’ve used them a hundred times in conversation practice. Each time I wanted to use these words I had to make at least some effort in recalling them and constantly think of the tones, which was very tiring.

September 29, 2011

Learning an intermediate language - Italian

Hello all,

On my blog I have written articles about difficult and simple languages before and I realized that I didn’t write anything about intermediate languages yet, so I will try to dedicate an entire post to them now. As I mentioned earlier, there are probably much better divisions of languages based on their difficulty. I do not challenge them, but I find that up until now, all the languages that I’ve learned fall into three simple categories: simple, intermediate, difficult – depending on how far a language you already speak at a native/advanced fluency level is from these languages.

For me an intermediate language (or a language that I find to be intermediately difficult to learn) is:
  • A language that is outside of my native language group, or outside the language group of a language that I already speak well, but still within the same general family[1]
  • The grammar is at least 50% identical with the languages I already speak at an advanced/native fluency level
  • Another 30% of concepts present in the grammar are concepts that can also be found in the languages I already speak but are used rarely or formulated in a different way
  • At least 10% of grammar concepts are completely alien to me
  • There is a large number of cognates in the language, but different pronunciation might leave them unrecognized at first
  • The sound system is at least 50% identical[2] with the languages I already speak
  • Literal translations are often possible
  • Cultural difference is not a substantial issue
From a strictly analytical point of view, if you look at English and Italian for instance, you almost can go as far as saying that they are two distant dialects of Indo-European. They both share large amounts of Latin or Greek based vocabulary, Italian vocabulary has received a lot of influence from English, there are numerous grammar concepts that overlap, a lot of expressions in Italian can be directly translated into English, often literally.