An Overview and Some Critical Observations
from Part IV - Arabic Computational and Corpus Linguistics
Published online by Cambridge University Press: 23 September 2021
Mark Van Mol provides a critical review of the issues involved in the construction of usable Arabic corpora and the solutions that programmers have attempted in resolving them. One such issue is whether a corpus is made freely available or is placed behind a paywall. This distinction often translates into corpus size, as well, with freely available corpora generally being larger and untagged for parts of speech (POS) and those hidden behind paywalls being smaller and POS-tagged. The reason for this is clear: POS tagging requires large amounts of painstaking labour; on the other hand, scouring large amounts of text from the Internet with web scrubber applications can be done in seconds. As for corpus size, different qualifications make it difficult to compare. Size may be expressed in the number of articles, hours, tokens, kilobytes, megabytes, sentences, words, and sometimes paragraphs that the corpus encompasses. One of the reasons for this is that defining the searchable units of Arabic texts presents complications. Such considerations pertain directly to questions of corpus representativeness. With that arises the question of the nature of the phenomenon under scrutiny, whether the corpora are intended to represent Classical Arabic, modern written Arabic, or Arabic dialects.
To save this book to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Find out more about the Kindle Personal Document Service.
To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.
To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.