From ce599e4f9f94b4eb00c1b5edb85bce5431ab3df2 Mon Sep 17 00:00:00 2001 From: toma Date: Wed, 25 Nov 2009 17:56:58 +0000 Subject: Copy the KDE 3.5 branch to branches/trinity for new KDE 3.5 features. BUG:215923 git-svn-id: svn://anonsvn.kde.org/home/kde/branches/trinity/kdeedu@1054174 283d02a7-25f6-0310-bc7c-ecb5cbfe19da --- kiten/edict_doc.html | 695 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 695 insertions(+) create mode 100644 kiten/edict_doc.html (limited to 'kiten/edict_doc.html') diff --git a/kiten/edict_doc.html b/kiten/edict_doc.html new file mode 100644 index 00000000..fc52d1e4 --- /dev/null +++ b/kiten/edict_doc.html @@ -0,0 +1,695 @@ + + + + + +EDICT Documentation + + + +

E D I C T

+

+

+

JAPANESE/ENGLISH DICTIONARY FILE

+ +

+Copyright (C) 2003 The Electronic Dictionary Research and Development Group, +Monash University. +

+

+Contents: +

+ +

+INTRODUCTION +

+

+The EDICT file results from a long-running project to produce a freely +available Japanese/English Dictionary in machine-readable form. +

+

+The EDICT file is copyright, and is distributed in accordance with the +Licence Statement, which can found at the WWW site of the + Electronic Dictionary Research and Development Group +who are the owners of the copyright. +

+

+CURRENT VERSION +

+

+The version date and sequence number is included in the dictionary itself +under the entry "EDICT". (Actually it is under the JIS-ASCII code "????". +This keeps it as the first entry when it is sorted.) +

+

+The master copy of EDICT is in the pub/nihongo directory of + ftp.cc.monash.edu.au. +There are other copies around, but they may not be +as up-to-date. The easy way to check if the version you have is the latest is +from the size/date. +

+

+As of V96-001, the EDICT file no longer contains proper names. These have +been moved to a separate file called "ENAMDICT". +From V99-002, the EDICT file has been generated from an extended dictionary +database which includes additional fields and information. See the later +section on the new JMdict project for details of this. +

+

+FORMAT +

+

+EDICT's format is that of the original "EDICT" format used by the early +PC Japanese word-processor MOKE (Mark's Own Kanji Editor). +It uses EUC-JP coding for kana and kanji, however this can be converted to +JIS (ISO-2022-JP) or Shift-JIS +by any of the several conversion programs around. It is a text file with one +entry per line. The format of entries is: +

+

+

+
+KANJI [KANA] /English_1/English_2/.../
+
+

+or +

+

+

+
+KANA /English_1/.../
+
+

+(NB: Only the KANJI and KANA are in EUC; all the other characters, including +spaces, must be ASCII.) +

+

+The English translations are deliberately brief, as the application of the +dictionary is expected to be primarily on-line look-ups, etc. +

+

+The EDICT file is not intended to have its entries in any particular order. +In fact it almost always is in order as a by-product of the update method I +use, however there is no guarantee of this. (The order is almost always JIS ++ alphabetical, starting with the head-word.) +

+

+EDICT HISTORY +

+

+EDICT has developed as follows: +

+
    +
  1. +it began with the basic EDICT distributed with MOKE 2.0. This was compiled by MOKE's +author, Mark Edwards, with assistance from Spencer Green. Mark +kindly released this material to the EDICT project. A number of corrections +were made to the MOKE original, e.g. spelling mistakes, minor +mistranslations, etc. It also had a lot of duplications, which have been +removed. It contained about 1900 unique entries. Mark Edwards has also +kindly given permission for the vocabulary files developed for KG (Kanji +Guess) to be added to EDICT. +
  2. +
  3. +additions by Jim Breen. I laboriously keyed in a ~2000 entry dictionary +used in my first year nihongo course at Swinburne Institute of Technology +years ago (I was given permission by the authors to do this). I then worked +through other vocabulary lists trying to make sure major entries were not +omitted. The English-to-kana entries in the SKK files were added also. This +task is continuing, although it has slowed down, and I suspect I will run out +of energy eventually. Apart from that, I have made a large number of +additions during normal reading of Japanese text and fj.* news using JREADER +and XJDIC. (As of November 2001 I am still adding entries.) +

    +

    +
  4. +
  5. +additions by others. Many people have contributed entries and +corrections to EDICT. I am forever on the lookout for sources of material, +provided it is genuinely available for use in the Project. I am +grateful to Theresa Martin who an early supplier a lot of useful material, +plus very perceptive corrections. Hidekazu Tozaki has also been a great help +with tidying up a lot of awry entries, and helping me identify obscure kanji +compounds. Kurt Stueber has been an assiduous keyer of many useful entries. +A large group of contributions came from Sony, where Rik Smoody had put +together a large online dictionary. Another batch came from the +Japanese-German JDDICT file in similar format that Helmut Goldenstein keyed +(with permission) from the Langenscheidt edited by Hadamitzky. Harold Rowe +was great help with much of the translation. During 1994, Dr Yo Tomita, then +at the University of Leeds, conducted a massive proof-reading of the entire +file, for which I am most grateful. Jeffrey Friedl at Omron in Kyoto has also +been a most helpful contributor and error-detector. During 1995, I have been +keeping an eye on the "honyaku" mailing list, wherein Japanese-English +translators discuss thorny issues. From this I have derived many new entries, +and many updates to existing entries. To the many honyakujin, my thanks. +
  6. +
+A reasonably full list of contributors is at the back of this file, +although I am sure to have missed a few. +

+At this stage EDICT has many more entries than many good commercial dictionaries, +which typically have 20,000+ non-name entries with examples, etc. It is +certainly bigger than some of the smaller printed dictionaries, and when used +in conjunction with a search-and-display program like JDIC or XJDIC it +provides a highly effective on-line dictionary service. +

+

+COPYRIGHT ISSUES +

+

+Dictionary copyright is a difficult point, because clearly the first +lexicographer who published "inu means dog" could not claim a copyright +violation over all subsequent Japanese dictionaries. While it is usual to +consult other dictionaries for "accurate lexicographic information", as +Nelson put it, wholesale copying is, of course, not permissible. What makes +each dictionary unique (and copyrightable) is the particular selection of +words, the phrasing of the meanings, the presentation of the contents (a very +important point in the case of EDICT), and the means of publication. Of +course, the fact that for the most part the kanji and kana of each entry are +coming from public sources, and the structure and layout of the entries +themselves are quite unlike those in any published dictionary, adds a degree +of protection to EDICT. +

+

+The advice I have received from people who know about these things is that +EDICT is just as much a new dictionary as any others on the market. Readers +may see an entry which looks familiar, and say "Aha! That comes from the XYZ +Jiten!". They may be right, and they may be wrong. After all there aren't +too many translations of neko. Let me make one thing quite clear, despite +considerable temptation (Electronic Books can be easily decoded), NONE of +this dictionary came from commercial machine-readable dictionaries. I have a +case of RSI in my right elbow to prove it. +

+

+Please do not contribute entries to EDICT which have come directly from +copyrightable sources. It is hard to check these, and you may be +jeopardizing EDICT's status. +

+

+LEXICOGRAPHICAL DETAILS +

+

+Introduction +

+

+EDICT is actually a Japanese->English dictionary, although the words within +it can be selected in either language using appropriate software. (JDIC uses +it to provide both E->J and J->E functionality.) +

+

+The early stages of EDICT had size limitations due to its usage (MOKE scans +it sequentially and JDXGEN, which is JDIC's index generator, held it in RAM.) +This meant that examples of usage could not be included, and inclusion of +phrases was very limited. JDIC/JDXGEN can now handle a much larger +dictionary, but the compact format has continued. +

+

+No inflections of verbs or adjectives have been included, except in idiomatic +expressions. Similarly particles are handled as separate entries. Adverbs +formed from adjectives (-ku or ni) are generally not included. Verbs are, of +course, in the plain or "dictionary" form. +

+

+Priority Entries +

+

+Starting with the 2001 editions, approximately 20,000 entries comprising the most commonly-used words in Japanese are marked +with a "(P)" at the end of the entry. This list has been identified by +examining several small +dictionaries, and lists of common gairaigo from Japanese newspapers. +

+

+Parts of Speech +

+

+In working on EDICT, bearing in mind I want to use it in MOKE and with JDIC, +I had to come up with a solution to the problem of adjectival nouns +[keiyoudoushi] (e.g. kirei and kantan), nouns which can be used adjectivally +with the particle "no" and verbs formed by adding suru (e.g. benkyousuru). +If I put entries in EDICT with the "na" and "suru" included, MOKE would not +find a match when they are omitted or, the case of suru, inflected. What I +decided to do is to put the basic noun into the dictionary and add +"(vs)" where it can be used to form a verb with suru, "(a-no)" for common +"no" usage, and "(an)" if it is an adjectival noun. Entries appeared as: +

+

+

+
+KANJI [benkyou] /study (vs)/ 
+KANJI [kantan] /simple (an)/ 
+
+

+In early 2001, as part of the JMdict project (see below), I completely revised +this system, instead introducing a comprehensive system of Part of Speech +(POS) tags. In the EDICT version of the file these tags usually appear in +parentheses +at the start of the entry, separated into general tags and POS tags. Where +a tag applies to a single gloss or meaning, it will be included there instead. +

+

+The (hopefully) full list of such markers is: +

+

+

+
+abbr 	   abbreviation
+adj 	   adjective (keiyoushi)
+adv 	   adverb (fukushi)
+adv-n 	   adverbial noun
+adj-na    adjectival nouns or quasi-adjectives (keiyodoshi)
+adj-no    nouns which may take the genitive case particle "no"
+adj-pn	   pre-noun adjectival (rentaishi)
+adj-s	   special adjective (e.g. ookii)
+adj-t	   "taru" adjective
+arch 	   archaism
+ateji     ateji reading of the kanji
+aux 	   auxiliary word or phrase
+aux-v 	   auxiliary verb
+conj	   conjunction
+col 	   colloquialism 
+exp	   Expressions (phrases, clauses, etc.)
+ek	   exclusively kanji, rarely just in kana
+fam 	   familiar language 
+fem 	   female term or language
+gikun 	   gikun (meaning) reading
+gram 	   grammatical term
+hon 	   honorific or respectful (sonkeigo) language 
+hum 	   humble (kenjougo) language 
+id 	   idiomatic expression 
+int	   interjection (kandoushi)
+iK 	   word containing irregular kanji usage
+ik 	   word containing irregular kana usage
+io 	   irregular okurigana usage
+MA 	   martial arts term
+male 	   male term or language
+m-sl 	   manga slang
+n	   noun (common) (futsuumeishi)
+n-adv	   adverbial noun (fukushitekimeishi)
+n-t	   noun (temporal) (jisoumeishi)
+n-suf     noun, used as a suffix
+n-pref    noun, used as a prefix
+neg 	   negative (in a negative sentence, or with negative verb)
+neg-v 	   negative verb (when used with)
+num       number, numeric
+obs 	   obsolete term
+obsc 	   obscure term
+oK 	   word containing out-dated kanji 
+ok 	   out-dated or obsolete kana usage
+pol 	   polite (teineigo) language 
+pref 	   prefix 
+prt       particle
+qv 	   quod vide (see another entry)
+sl 	   slang
+suf 	   suffix 
+uK 	   word usually written using kanji alone 
+uk 	   word usually written using kana alone 
+v1	   Ichidan verb
+v5	   Godan verb (not completely classified)
+v5u	   Godan verb with `u' ending
+v5u-s	   Godan verb with `u' ending - special class
+v5k	   Godan verb with `ku' ending
+v5g	   Godan verb with `gu' ending
+v5s	   Godan verb with `su' ending
+v5t	   Godan verb with `tsu' ending
+v5n	   Godan verb with `nu' ending
+v5b	   Godan verb with `bu' ending
+v5m	   Godan verb with `mu' ending
+v5r	   Godan verb with `ru' ending
+v5k-s	   Godan verb - Iku/Yuku special class
+v5z	   Godan verb - -zuru special class (alternative form of -jiru verbs)
+v5aru	   Godan verb - -aru special class
+v5uru	   Godan verb - Uru old class verb (old form of Eru)
+vi 	   intransitive verb 
+vs 	   noun or participle which takes the aux. verb suru
+vs-i	   suru verb - irregular
+vs-s	   suru verb - special class
+vk	   Kuru verb - special class
+vt 	   transitive verb
+vulg 	   vulgar expression or word 
+X	   rude or X-rated term (not displayed in educational software)
+
+

+Multiple Senses +

+

+From the 2001 editions of EDICT, the differing senses associated with +the Japanese head-words are being progessively marked. The marking takes the +form of a "(1)", "(2)", etc. in front of the senses. +

+

+Spellings +

+

+I have endeavoured to cater for many possible variants of English translation +and spelling. Where appropriate different translations are included for +national variants (e.g. autumn/fall). I use Oxford (British) standard +spelling (-our, -ize) for the entries I make, but I leave other entries in +the national spelling of the submitter. +

+

+At some stage in the future I intend to regularize the English spellings in such +a way that allows searches on either British or American spellings +to be successful. +

+

+Gairaigo and Regional Words +

+

+For gairaigo which have not been derived from English words, I have attempted +to indicate the source language and the word in that language. Languages have +been coded in the two-letter codes from the ISO 639:1988 "Code for the +representation of names of languages" standard, e.g. "(fr: avec)". See +Appendix C for more on this. (Thanks to Holger Gruber for suggesting this +language coding.) +

+

+In addition to the language codes described in Appendix C, a number of tags +are used to indicate that a word or phrase is associated with a particular +regional language variant within Japan. The tags are: +

+

+

+
+kyb	Kyoto-ben
+osb	Osaka-ben
+ksb	Kansai-ben
+ktb	Kantou-ben
+tsb	Tosa-ben
+
+

+In the case of gairaigo which have a meaning which is not apparent from the +original (English) words, the literal transcription is included, with +the tag (lit). +

+

+NEW JMDICT PROJECT +

+

+Early in 1999 work began on the JMdict project, which aims to extend the +structure and content of the EDICT file to enable it to contain +additional information and provided an improved service to users. +

+

+The project has several broad goals: +

+
    +
  1. to convert the EDICT file to a new dictionary structure which overcomes +the deficiencies in the current structure. With regard to this goal, the +particular structural and content aspects to be addressed include, but +are not limited to: +
      +
    1. the handling of orthographical variation (e.g. in kanji +usage, okurigana usage, readings) within the single entry; +
    2. +
    3. additional and more appropriately associated tagging of +grammatical and other information; +
    4. +
    5. provision for separation of different senses (polysemy) in +the translations; +
    6. +
    7. provision for the inclusion of translational equivalents +from several languages; +
    8. +
    9. provision for inclusion of examples of the usage of words; +
    10. +
    11. provision for cross-references to related entries. +
    12. +
    +
  2. +
  3. to publish the dictionary in a standard format which is accessible +by a wide range of software tools; [It is proposed that this goal be +addressed by developing the structure so that it can be released as +an XML document, with an associated XML DTD. +
  4. +
  5. to retain backward compatibility with the original EDICT structure in +order to enable legacy software systems to use later versions of the +EDICT files. +
  6. +
+For more information on the JMdict project, please see the documentation +files. +

+By May 1999 the EDICT file had been converted into the new format. A major +part of this consisted of identifying and combining entries which were +effectively variants of each other. +

+

+Since V99-002, the EDICT file has been generated from the new format. +This has meant: +

+
    +
  1. a marginal increase in the number of entries, as there is an increased +number of variants; +
  2. +
  3. the English fields of the variant entries are now exactly the same, +as they have generated from the single expanded entry; +
  4. +
  5. the tags such as (vs), (an), etc. now appear before the first word +of the English fields. +
  6. +
+USAGE +

+EDICT can be freely used provided satisfactory acknowledgement is made, +and a number of other conditions are met. +Consult the Licence Statement information at Appendix A. +

+

+It is, of course, the main dictionary used by PD and GPL Copyright software +such as JDIC, JREADER, XJDIC, MacJDic, etc. It can be used as the +dictionary within MOKE (it may need to be renamed JTOE.DCT if used with +version 2.1 of MOKE), and it is also used by the NJSTAR and JWP Word +Processor packages. +

+

+CONTRIBUTIONS +

+

+I will be delighted if people send me corrections, suggestions, and ESPECIALLY +additions. Before ripping in with a lot of suggestions, make sure you have the +latest version, as others may have already made the same comments. +

+

+The preferred format for submissions is a JIS, EUC or Shift-JIS file (uuencoded +for safety) containing replacement/new entries. This can be emailed to me at +the address at the end of this file. +

+

+Feel free to use the following format: +

+

+

+
+NEW: KANJI1 [kana1] /new entry #1/
+
+NEW: KANJI2 [kana2] /new entry #2/
+
+old: KANJI3 [kana3] /old entry to be replaced/
+new: KANJI3 [kana3] /replacement entry/
+
+DEL: KANJI4 [kana4] /entry to be deleted/
+
+

+Please provide an annotated reason for any deletions or amendments you send. +

+

+I prefer not to get a "diff" or "patch" file as the master EDICT is under +continuous revision, and may have had quite a few changes since you got your +copy. +

+

+Users intending to make submissions to EDICT should follow the following +simple rules: +

+ +

+ACKNOWLEDGEMENTS +

+

+The following people, in roughly chronological order, have played a part in +the development of EDICT. (I stopped adding to this list some years ago, so +it is of historical interest now.) +

+

+Mark Edwards, Spencer Green, Alina Skoutarides, Takako Machida, Theresa +Martin, Satoshi Tadokoro, Stephen Chung, Hidekazu Tozaki, Clifford Olling, +David Cooper, Ken Lunde, Joel Schulman, Hiroto Kagotani, Truett Smith, Mike +Rosenlof, Harold Rowe, Al Harkom, Per Hammarlund, Atsushi Fukumoto, John +Crossley, Bob Kerns, Frank O'Carroll, Rik Smoody, Scott Trent, Curtis +Eubanks, Jamie Packer, Hitoshi Doi, Thalawyn Silverwood, Makato Shimojima, +Bart Mathias, Koichi Mori, Steven Sprouse, Jeffrey Friedl, Yazuru Hiraga, Kurt +Stueber, Rafael Santos, Bruce Casner, Masato Toho, Carolyn Norton, Simon +Clippingdale, Shiino Masayoshi, Susumu Miki, Yushi Kaneda, Masahiko +Tachibana, Naoki Shibata, Yuzuru Hiraga, Yasuaki Nakano, Atsu Yagasaki, +Hitoshi Oi, Chizuko Kanazawa, Lars Huttar, Jonathan Hanna, Yoshimasa Tsuji, +Masatsugu Mamimura, Keiichi Nakata, Masako Nomura, Hiroshi Kamabe, Shi-Wen +Peng, Norihiro Okada, Jun-ichi Nakamura, Yoshiyuki Mizuno, Minoru Terada, +Itaru Ichikawa, Toru Matsuda, Katsumi Inoue, John Finlayson, David Luke, Iain +Sinclair, Warwick Hockley, Jamii Corley, Howard Landman, Tom Bryce, Jim +Thomas, Paul Burchard, Kenji Saito, Ken Eto, Niibe Yutaka, Hideyuki Ozaki, +Kouichi Suzuki, Sakaguchi Takeyuki, Haruo Furuhashi, Takashi Hattori, +Yoshiyuki Kondo, Kusakabe Youichi, Nobuo Sakiyama, Kouhei Matsuda, Toru Sato, +Takayuki Ito, Masayuki Tokoshima, Kiyo Inaba, Dan Cohn, Yo Tomita, Ed Hall, +Takashi Imamura, Bernard Greenberg, Michael Raine, Akiko Nagase, Ben Bullock, +Scott Draves, Matthew Haines, Andy Howells, Takayuki Ito, Anders Brabaek, +Michael Chachich, Masaki Muranaka, Paul Randolph, Vesa Karhu, Bruce Bailey, +Gal Shalif, Riichiro Saito, Keith Rogers, Steve Petersen, Bill Smith, Barry +Byrne, Satoshi Kuramoto, Jason Molenda, Travis Stewart, Yuichiro Kushiro +Keiko Okushi, Wayne Lammers, Koichi Fujino, Joerg Fischer, Satoru Miyazaki, +Gaspard Gendreau, David Olson, Peter Evans, Steven Zaveloff, Larry Tyrrell, +Heinz Clemencon, Justin Mayer, David Jones, Holger Gruber, David Wilson, +John De Hoog, Stephen Davis, Dan Crevier, Ron Granich, Bruce Raup, Scott +Childress, Richard Warmington, Jean-Jacques Labarthe, Matt Bloedel, Szabolcs +Varga, Alan Bram, Hidetaka Koie, David Villareale, Hirokazu Ohata, Toshiki +Sasabe, William Maton, Tom Salmon, Kian Yap, Paul Denisowski, Glen Pankow, +Richard Northcott, Roger Meunier, Petteri Kettunen, Jeff Korpa, Kanji +Haitani, Liam O'Brien, Serdar Yegulalp, Jonathan Way, Gururaj Rao, Yoichiro +Niitsu, Ralph Seewald, Andreas Jordell, Chua Hian Koon, Hartmut Pilch, +Shouichi Takeuchi, Ayumu Yasutomi, Mike Wright, James Rose, Nich Hill. +

+

+Jim Breen +
+jwb@csse.monash.edu.au +
+School of Computer Science & Software Engineering +
+Monash University +
+Clayton 3168 +
+AUSTRALIA +


+APPENDIX A: EDICT LICENCE STATEMENT +

+

+In March 2000, James William Breen assigned ownership of the copyright +of the dictionary files assembled, coordinated and edited by him to the +The Electronic Dictionary Research and Development Group at Monash +University. +

+

+EDICT can be freely used provided satisfactory acknowledgement is made, +and a number of other conditions are met. +Information about the licence and copyright for EDICT can be found on +the Group's WWW page at: http://www.csse.monash.edu.au/groups/edrdg/ +

+

+In summary, EDICT can be freely used with acknowledgement. +

+

+


+APPENDIX B. LANGUAGE CODES FROM ISO 639 +

+

+The following language codes have been used with non-English derived +gairaigo. They have been derived from the ISO 639:1988 "Code for the +representation of names of languages" standard. +

+

+

+
+ar 	Arabic
+zh 	Chinese (Zhongwen)
+de 	German (Deutsch)
+en 	English
+fr 	French
+el 	Greek (Ellinika)
+iw 	Hebrew (Iwrith)
+ja 	Japanese
+ko 	Korean
+nl 	Dutch (Nederlands)
+no 	Norwegian
+pl 	Polish
+ru 	Russian
+sv 	Swedish
+bo 	Tibetan (Bodskad)
+eo 	Esperanto
+es 	Spanish
+in 	Indonesian
+it 	Italian
+lt 	Latin
+pt 	Portugese
+hi 	Hindi
+ur 	Urdu
+mn 	Mongolian
+kl 	Inuit (formerly Eskimo)
+
+

+And I have added the following, which are not in the Standard: +

+

+

+
+ai 	Ainu
+
+ +
Disclaimer \ No newline at end of file -- cgit v1.2.1