Indonesian is one of the world's major spoken languages, and is increasingly used on the web. With this increasing availability of digital unstructured language data, language resources are needed for improving machine translation, data mining and other computational linguistic tasks. This project aims to build these resources by carrying out research on Indonesian to create a robust computational grammar, corpus and lexicon (including social variation) within the Pargram framework. Pargram is an international collaborative project to develop computational grammars within a shared linguistic framework based on common linguistic assumptions. The outcomes of this project should lead to a better understanding of Indonesian grammar as well as creating reliable machine-usable language resources.
The project will bring together a multi-site interdisciplinary team of investigators - Jane Simpson (Linguistics, Sydney University, Australia), I Wayan Arka and Avery Andrews (Linguistics, ANU, Australia), and Mary Dalrymple (Linguistics and Philology, Oxford University, U.K/PARC, California) . This collaboration forms a solid international team that can work effectively with the wider Pargram research group. Our research team will also work in collaboration with Indonesian researchers from The Faculty of Computer Science Universitas Indonesia Jakarta. We will also include a computational linguistics research associate, a PhD student and a data/manager programmer.
The initial stages of our research in 2007 was funded by a near miss grant from the University of Sydney. The major component of the project is funded by an ARC Discovery Grant.
Indonesian, or Bahasa Indonesia, is an Austronesian language of the Malayo-Polynesian branch. It is the mother tongue of approximately 17 million speakers world wide. Its writing system, or orthography, is based on the Latin alphabet. Indonesian is the official language of Indonesia and is predominantly spoken throughout Indonesia and East Timor and is spoken as a minor language throughout various parts of Asia and neighbouring countries, such as the Philippines, Thailand and Australia, to name a few.
The canonical word order for Indonesian is SVO (subject-verb-object) in the active voice. Indonesian employs a complex system of affixation, exhibiting both inflectional and derivational morphology. Interestingly, it also has a system of nominal classifiers, which are no longer in strict usage.
Indonesia is a country with many diverse societies, whose variety reflects the diversity of the environment, from coastal to inland, from agricultural to horticultural to sea-faring. Their cultures have been shaped by interactions with colonisers and migrants, as well as by the introduction of many religions, including Islam and Christianity.
Due to geographic proximity, Australia's relationship with Indonesia is seen as very important the Australian government.