Project Ramadan: How can Data explain the Quran?

During this holy times and with some enthusiasm for Data, I cant help but raise the question: How can Data Explain the Quran?

Imagine a Dataset with four columns: Word, Verse/Aya, Soora and Juz. Basically the Quran is structured into ‘Soora’ which comprise ‘Aya’ or verse which comprise Words. Moreover, it is also divided into Juz which divides the Quran in 30 parts. So in other words, imagine the Quran as a book with 114 chapters (Soora) and the entire Quran is divided into 30 parts (Juz).

If we can align each word to its Verse, Soora and Juz we can determine a great many variables about the Quran such as:
– The most said word
– The most said word in each Juz
– The most said word in each chapter
– The least said word
– Average String length of each Verse
– The verse with the most words
– The Soora with the most

The fun part is that I could not find any ready dataset on kaggle that satisfied the same objective. Therefore, I have decided to start my very first project that I will gladly submit on kaggle once I am done; a dataset with four columns of Word, ‘Aya’, ‘Soora’ and ‘Juz’.

I have found an API for Quran.com which I am currently studying its documentation to be able to develop the code to get the verses.

Wish me luck!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s