This post follows on from part 1 to provide some practical advice on how to get started with Machine Learning.
Data scientists extract knowledge and insights from data.
It’s an interdisciplinary field that combines technical know-how across a number of fields including mathematics, statistics, computer science & machine learning. It’s these varied skills which allow a data scientist to extract the knowledge hidden within data.
How much data do you need?
It’s often said you need Big Data to enable ML & AI. While having a lot of data is great, this is not totally accurate. Smaller data sets of very clean data are far better than larger data sets of dirty data. Clean data is where all values are present and accurate (no missing values, formatting issues, etc).
While the more clean data the better, you can sometimes get good results with only hundreds of really clean accurate records.
Domain knowledge is the specialised and deep knowledge of an endeavour and/or business domain. Having access to this knowledge usually helps interpret data and extract knowledge.
Start Small & Iterate
Large projects commonly associated with large risk. Small projects, on the other hand, are cheap, great for learning & are far more likely to succeed. Once you have something working, use the Agile & DevOps mindset to iterate and build up a solution.
Start without Coding
You can do a great deal with Cloud based ML solutions. AWS, Azure & Google all have cheap accessible options. These services take away some of the complexity and allow you to prototype and test out ideas quickly & easily.
It may not always be possible to leverage these, for example, if you have bandwidth limitation and are working with real time image or video processing. If so, there are still great options from TensorFlow to great Python libraries like scikit-learn.
Start small, be aware of the limitations, prototype, iterate & refine.
Following a simple process like above can help reduce the risks associated with Machine Learning. It’s also sensible to use any predictions to augment business decisions - not drive them - at least until you have full confidence in the technology and output.