Choosing the Right Machine Learning Technology
Being one of the most interesting and demanding skills in the programming world over the last few years, Machine Learning (ML) is a very powerful and useful tool used in products and services, making our everyday life a little bit easier.
There are several different types of problems that can be solved with ML. The problem that we ran into is organizing and labelling large set of data, that belongs to a category called supervised learning.
Supervised learning
Supervised ML learning is the most common and studied type of Machine Learning. A certain machine is trained with labeled data. There are usually two types of problems: regression or classification.
Classification is used for putting something into categories, which happens to be our current problem. We had to resolve a problem with categorizing large textual data so we will focus more on this type of ML and offer you our path to a solution.
Text classification
As more and more data is being created each day, it’s hard for humans to keep up. All this information and texts should be somehow organized. That’s where text classification comes. Text classification by using certain algorithms enables you to classify text in a reliable, accurate, and cost-effective way.
Text classification problem is one of the most common problems in supervised learning. After some researching and testing several technologies, we’ve decided to share our experience and our path to solving this type of problem.
There are many ways and approaches to this type of problem, so the real question is: how to choose the right one?
Choosing the Right Dependency
When it comes to choosing the right library for a project, there are certain criteria developers must have in mind. Some of them are portability, quality, maintenance status, efficiency, etc.
The library has to be regularly updated. Also, if it has a community on a certain forum where questions can be asked and discuss different problems with other developers, that’s a good sign.
As a .NET oriented company, we first tried to find a solution within C#. However, we’ve concluded that a part of training in Keras library with Python (implemented in TensorFlow) is very good solution. Keras is the high-level API of TensorFlow 2.0, it has highly-productive interface for solving ML problems and has high iteration velocity. It’s a very good option because it’s easy to use and there is lowered ML learning curve which allows us quick testing iterations without digging too deep into algorithms themselves.
The initial tests have shown us very good results.
Text Classification Algorithms
When it comes to algorithms, the best way to choose the right one is having the ability to recognize which solution fits the best for your problem. We’ve tried some deep learning methods explained below, because of the automatic feature extraction which is convenient method in neural networks.
CNN – Convolutional Neural Networks are built of multiple layers that are applying filters to inputs. Filters are used to extract features from input data which are then used to make predictions on input data.
On the other hand, RNN – Recurrent Neural Networks for every input calculates and memorizes the previous conclusion of the output for the next input.
It’s important to say that these algorithms work better if given a large set of data.
In conclusion, every problem requires the best possible approach. If you understand the given problem it’ll be easier for you to find suitable algorithm and solve your problem. Have in mind all of the criteria for choosing the right library and discuss with your ML specialists why certain choice is the best. Their skills and knowledge will also have an impact on what type of technology to use, so make sure to collect the feedback of all team members. Always keep up with news and updates, discuss with your team and you will surely be on the right path.