
1. Introduction
If there's one thing similar about an interesting dataset and a good rerun football's match on TV, is that they're both doing pretty excellent job at keeping everyone's safe at home during this time of the pandemic. And in all honesty, I'm not a data-scientist, nor a dev guru. I just recently got myself exposed to Machine Learning, Data Mining and Artificial Intelligent in general, while doing them in both Python (Pandas, NumPy and SciKit libraries), somewhere a little over then 3 months period of time. And without any further ado, here's my take to the FIFAconundrum's challenge.
2. Dataflow
And since the challenge is not to 'predict' any variables, rather to 'group' or 'cluster' the existing dataset from the player's skillsets, in reflect to their wages rate. Here's what my current flow would look like, and don't bother much on the 2 additional datasets, as they're merely exported from the existing model, so that I may explore them further later on. And to follow along, here's the link to download the dataset in my Github repository.
3. Prepare Recipes
And here's how I go about on the prepare recipes, nothing out of the ordinary. Just converting categorical to numerical values, through the one-hot encoding and filling up the 'NaN' with median values, while grouping them to have better clarity, if ever the need occur for me to go back and revise anything again for future reference.
"Creating a flow that outputs a value proposition in term of the wages"
Correlation Matrix Values
- 1.0 means a perfect correlation
- 0.0 means no correlation
- -1.0 means a perfect "inverse" correlation
Been enjoying exploring this dataset for sure, and certainly it was fun doing it, stays safe everyone and leave a thumbs-up if you like the article and found this useful. 😊