2010 was a big year: The website KDnuggets, a leading site on AI, data science, and other advanced analytics topics, has been attracting hard-core data scientists for decades. Every year, KDnuggets conducts an annual survey on the “Data Mining / Analytic Tools Used.”
Until 2010, proprietary vendors like SAS, the biggest player in the “old guard” category of advanced analytics, topped the list. Founded in 1976 in North Carolina, SAS grew quickly from its roots in analyzing agricultural data to grow rapidly, across industries and the globe. By 2010, SAS had become an analytics behemoth, with well-entrenched statistics and machine learning products, and an army of SAS coders who sought to use their favorite tool with every job.
In the 2010 survey, R – the free statistics environment – rocketed to #2 on the list, past SAS. The growth of R had been happening for a few years, and the survey provided one data point among many that open-source analytics had reached critical mass. It wasn’t long before Python became a popular data science tool, recently surpassing R as the most common analytics language.
Open-source analytics started as a revolt against the high prices and closed nature of the proprietary systems. The story of R in 2010 is a grass roots story: practitioners chose free or near-free tools to teach and apply data science techniques. The success of open source software like Linux and Red Hat meant that open source would eventually expand, this time to advanced analytics.
Why should you consider open source for your business?
There are three reasons why open source makes sense for most businesses: cost, talent, and innovation.
1. Open source software costs – even all related costs – are impossible to ignore.
Proprietary vendors will stress that ‘free is not free.’ To be fair, software costs should include training, related software hardware, and the cost of changing business practices. Yet even when factoring in these related costs, adopting open source like Python libraries is closer to free than proprietary vendors’ offerings.
If your project is critical to your business, you don’t want just free software – you want a proportionate investment in the analytics infrastructure that supports your strategy. Virtualization and open source have taught us that traditional conventions of hardware and software costs can be up-ended.
2. The talent shortage is the biggest issue facing analytics teams today.
The issue that tops the list for data science and advanced analytics is a lack of skills. Finding and keeping the right people is expensive and extremely challenging, particulary with business deadlines. Today, data scientists learn Python and R.
Often organizations use a partner, like Syntelli Solutions, instead of building teams to create and maintain data science solutions. Even in this situation, open source is attractive. It’s critical for maintainable solutions to build in a generally-accepted languages like R and Python.
3. Open source software is about people – highly motivated, innovative people.
Perhaps the most important reason for selecting open source over proprietary software like SAS, is innovation. A massive company plays an important role in innovation. Afterall, these sort of companies typically invest 20% or more of their revenue in R&D.
The problem is that armies of people behind successful open source software innovate continuously, adding real-world capabilities that are added to solve specific problems. What’s more, open source embraces transparency in this process, so you don’t need to hope that a vendor will come around to the enhancement you need, and not charge you for it when it arrives.
Syntelli Solutions specializes in open source software, languages, and engines. Learn more about how we can help you navigate your technology decisions and bring analytics-driven value to your market quickly.