The basic technical idea of Data Science has been around for decades but why is Data Science with Machine Learning algorithms working so well in recent years? And why is that country like Pakistan unable to take advantage of such a growing market?
Rise of Data Science
Over the last 10 years, we went from having a relatively small amount of data to having often a fairly enormous amount of data and all of this was possible because of digitization of a society where so much human activity is now in the digital form. We spend so much time on computers, websites, mobile apps and activities on digital devices that create data. With the rise of inexpensive cameras built into our mobile phones and many sensors in the IoT Devices, we have been collecting more and more data. So over the last 2 decades for a lot of applications we just accumulate a lot more data more than traditional learning algorithms could take advantage of.
- Amount of Data available for a Task
- Computational Power/Performance
Nowadays, one of the most reliable ways to get better performance in the Data Science is often to either train a bigger Machine Learning Model or throw more data at it and that only works up to a point because of two main reasons
- Eventually, you run out of data.
- Your trained model is so big that it takes too long to train.
Why is there an increased demand for Data Scientists?
Nearly every company now has the ability to collect data, and the amount of data is growing larger and larger, the need for skilled professionals who can analyze that data and glean business insights increases. This has led to a higher demand for employees with specific skills who can organize and analyze this data to produce business insights. The amount of people with these skills is still relatively low resulting in an increase in demand.
Business Data Science = Compressing billions of data points into one “yes” or “no”
Struggle of Data Scientists in Pakistan
Companies in Pakistan hire data scientists without a suitable infrastructure in place to get value out of Artificial Intelligence. The companies are not actually in a suitable position to make use of the data they have. This contributes to the Cold Start Problem in AI. The data scientist likely came in to write efficient algorithms to drive insight but can’t do this because their first job is to sort out the data infrastructure along with creating analytic reports.
Even with such tremendous power of improving business, many companies are oblivious to prioritize collecting data from their customers. Data scientists join a company to change it through data but are disappointed when they realize they’re merely dealing with data garbage. The company likely doesn’t know just how grubby its data is, so can’t understand why the data scientist cannot produce “actionable insights.”
In contrast, the company only wanted an infographic that they could present in their meeting. The company then gets frustrated because they don’t see value being driven quickly enough and all of this leads to the data scientist being unhappy in their role.
Reasons Companies Should Collect Data
Companies consider better data analytics in their operations for the very first time when they see others growing about understanding their target consumers through data. Perhaps they do their financials in Excel or their site analysis in Mixpanel or Google Analytics.
There following are good reasons for collecting data:
- If you need to analyze data from different sources. At some point in your company’s life, you would need to combine data from different internal tools to make better, more informed business decisions.
- If you need to separate your analytical data from your transactional data. If you collect activity logs or other potentially useful pieces of information in your app, it’s probably not a good idea to store this data in your app’s database and have your analysts work on the production database directly.
- If your original data source is not suitable for querying. For example, the vast majority of business intelligence (BI) tools do not work well with NoSQL data stores like MongoDB. This means that applications that use MongoDB on the backend need their analytical data to be transferred to a data warehouse, in order for data analysts to work effectively with it.
- If you want to increase the performance of your most-used analytical queries. If your transactional data comprises hundreds of thousands of rows, it’s probably a good idea to create summary tables that aggregate that data into a more queryable form. Not doing so will cause queries to be slow — not to mention having them being an unnecessary burden on your database.
Then there are the three major cloud providers — that is, Google Cloud, Amazon Web Services, or Microsoft Azure — and pick their cloud-based data warehousing solution. For Google Cloud, this is BigQuery. For AWS, this is Redshift. For Azure, this is Azure SQL Data Warehouse.
The Business Strategy
Businesses can mainly achieve two things with data science.
- Understanding their audience better. Learning about their needs, their struggles, their motivations, their habits, and their relationships to your product or service.
- Using this understanding to create a better product or service and turning that into profit.
Your number one priority should be to help your users. Because of that, your product or service will flourish. And that better product or service will bring you more users, more returning users, and eventually more revenue.