C#, Machine Learning, Uncategorized

Learn and Predict the Gender of German Nouns

The German language is know to be relatively complicated and especially the gender causes lots of confusion. While English has only one article (the), three different articles are used in German: der (male) die (female) das (neuter) While rules to determine the gender of a noun exist, almost no German native speaker can name them. We can now solve this problem (determine the gender without memorizing the rules) using some simple machine learning with the Accord framework. Let’s quickly name the steps that will follow: find and extract a dataset of noun-gender associations split into training, test and validation dataset extract features into something the algorithm can use train a Naive Bayes test the model with the test dataset After quite a while of searching, I found this machine readable and CC-BY-SA 4.0 licensed XML file from Daniel Naber. In our Universal Windows App we can then load all nouns into…