ASTROMAN - Consulting, Executive Search


Photo courtesy of deepsense.ai

Warsaw, Poland and Palo Alto, CA, USA - August 24, 2018

by Konrad Budek and Patryk Miziuła, deepsense.ai

For companies seeking ways to test AI-driven solutions in a safe environment, running a competition for data scientists is a great and affordable way to go – when it’s done properly.

According to a McKinsey report, only 20% of companies consider themselves adopters of AI technology while 41% remain uncertain about the benefits that AI provides.

https://www.mckinsey.com/

Considering the cost of implementing AI and the organizational challenges that come with it, it’s no surprise that smart companies seek ways to test the solutions before implementing them and get a sneak peek into the AI world without making a leap of faith.

That’s why more and more organizations are turning to data science competition platforms like Kaggle, CrowdAI and DrivenData.

https://www.kaggle.com/

https://www.crowdai.org/

https://www.drivendata.org/

Photo courtesy of deepsense.ai

Making a data science-related challenge public and inviting the community to tackle it comes with many benefits:

• Low initial cost – the company needs only to provide data scientists with data, pay the entrance fee and fund the award. There are no further costs.

• Validating results – participants provide the company with verifiable, working solutions.

• Establishing contacts – A lot of companies and professionals take part in Kaggle competitions. The ones who tackled the challenge may be potential vendors for your company.

• Brainstorming the solution – data science is a creative field, and there’s often more than one way to solve a problem. Sponsoring a competition means you’re sponsoring a brainstorming session with thousands of professional and passionate data scientists, including the best of the best.

• No further investment or involvement – the company gets immediate feedback. If an AI solution is deemed efficacious, the company can move forward with it and otherwise end involvement in funding the award and avoid further costs.

Photo courtesy of deepsense.ai

While numerous organizations – big e-commerce websites and state administrations among them – sponsor competitions and leverage the power of data science community, running a comptetion is not at all simple.

https://deepsense.ai/image-classification-sample-solution-kaggle/

https://deepsense.ai/deep-learning-right-whale-recognition-kaggle/

An excellent example is the competition the US National Oceanic and Atmospheric Administration sponsored when it needed a solution that would recognize and differentiate individual right whales from the herd.

Ultimately, what proved the most efficacious was the principle of facial recognition, but applied to the topsides of the whales, which were obscured by weather, water and the distance between the photographer above and the whales far below.
To check if this was even possible, and how accurate a solution may be, the organization ran a Kaggle competition, which deepsense.ai won.

Related: Three reasons why data analysts make the perfect data scientists

https://deepsense.ai/three-reasons-why-data-analysts-make-the-perfect-data-scientists/

Having won several such competitions, we have encountered both brilliant and not-so-brilliant ones.

That’s why we decided to prepare a guide for every organization interested in testing potential AI solutions in Kaggle, CrowdAI or DrivenData competitions.


Photo courtesy of deepsense.ai

Recommendation 1. Deliver participants high-quality data

The quality of your data is crucial to attaining a meaningful outcome. Minus the data, even the best machine learning model is useless.
This also applies to data science competitions: without quality training data, the participants will not be able to build a working model.
This is a great challenge when it comes to medical data, where obtaining enough information is problematic for both legal and practical reasons.

• Scenario: A farming company wants to build a model to identify soil type from photos and probing results. Although there are six classes of farming soil, the company is able to deliver sample data for only four. Considering that, running the competition would make no sense – the machine learning model wouldn’t be able to recognize all the soil types.

Advice: Ensure your data is complete, clear and representative before launching the competition.


Photo courtesy of deepsense.ai

Recommendation 2. Build clear and descriptive rules

Competitions are put together to achieve goals, so the model has to produce a useful outcome. And “useful” is the point here.
Because those participating in the competition are not professionals in the field they’re producing a solution for, the rules need to be based strictly on the case and the model’s further use.
Including even basic guidelines will help them to address the challenge properly.
Lacking these foundations, the outcome may be right but totally useless.

• Scenario: Mapping the distribution of children below the age of 7 in the city will be used to optimize social, educational and healthcare policies. To make the mapping work, it is crucial to include additional guidelines in the rules. The areas mapped need to be bordered by streets, rivers, rail lines, districts and other topographical obstacles in the city. Lacking these, many of the models may map the distribution by cutting the city into 10-meter widths and kilometer-long stripes, where segmentation is done but the outcome is totally useless due to the lack of proper guidelines in the competition rules.

Advice: Think about usage and include the respective guidelines within the rules of the competition to make it highly goal-oriented and common sense driven.


Photo courtesy of deepsense.ai

Recommendation 3. Make sure your competition is crack-proof

Kaggle competition winners take home fame and the award, so participants are motivated to win.
The competition organizer needs to remember that there are dozens (sometimes thousands) of brainiacs looking for “unorthodox” ways to win the competition.

Here are three examples

• Scenario 1: A city launches a competition in February 2018 to predict traffic patterns based on historical data (2010-2016). The prediction had to be done for the first half of 2017 and the real data from that time was the benchmark. Googling away, the participants found the data, so it was easy to fabricate a model that could predict with 100% accuracy. That’s why the city decided to provide an additional, non-public dataset to enrich the data and validate if the models are really doing the predictive work.

However, competitions are often cracked in more sophisticated ways.
Sometimes data may ‘leak’: data scientists get access to data they shouldn’t see and use it to prepare their model to tailor a solution to spot the outcome, rather than actually predicting it.

• Scenario 2: Participants are challenged to predict users’ age from internet usage data. Before the competition, the large company running it noticed that there was a long aplha-numeric ID, with the age of users embedded, for every record. Running the competition without deleting the ID would allow participants to crack it instead of building a predictive model.

Benchmark data is often shared with participants to let them polish their models.
By comparing the input data and the benchmark it is sometimes possible to reverse-engineer the outcome.
The practice is called leaderboard probing and can be a serious problem.

• Scenario 3: The competition calls for a model to predict a person’s clothing size based on height and body mass. To get the benchmark, the participant has to submit 10 sample sizes. The benchmark then compares the outcome with the real size and returns an average error. By submitting properly selected numbers enough times, the participant cracks the benchmark. Anticipating the potential subterfuge, the company opts to provide a public test set and a separate dataset to run the final benchmark and test the model.

Advice: Look for every possible way your competition could be cracked and never underestimate your participants’ determination to win.


Photo courtesy of deepsense.ai

Recommendation 4. Spread the word about your competition

One of the benefits of running a competition is that you get access to thousands of data scientists, from beginners to superstars, who brainstorm various solutions to the challenge.

Playing with data is fun and participating in competitions is a great way to validate and improve skills, show proficiency and look for customers.

Spreading the word about your challenge is almost as important as designing the rules and preparing the data.

• Scenario: A state administration is in need of a predictive model. It has come up with some attractive prizes and published the upcoming challenge for data scientists on its website. As these steps may not yield the results it’s looking for, it decides to sponsor a Kaggle competition to draw thousands of data scientists to the problem.

Advice: Choose a popular platform and spread the word about the competition by sending invitations and promoting the competition on social media. Data scientists swarm to Kaggle competitions by the thousands. It stands to reason that choosing a platform to maximize promotion is in your best interest.


Photo courtesy of deepsense.ai

Conclusion

Running a competition on Kaggle or a similar platform can not only help you determine if an AI-based solution could benefit your company, but also potentially provide the solution, proof of concept and the crew to implement it at the same time.

Could efficiency be better exemplified?

Related: Online course vs. instructor-led training - how to develop your team’s new skills?

https://deepsense.ai/online-course-vs-instructor-led-training-how-to-develop-your-teams-new-skills/

Just remember, run a competition that makes sense.
Although most data scientists engage in competitions just to win or validate their skills, it is always better to invest time and energy in something meaningful. It is easier to spot if the processing data makes sense than a lot of companies running competitions realize.

Preparing a model that is able to recognize plastic waste in a pile of trash is relatively easy. Building an automated machine to sort the waste is a whole different story.

Although there is nothing wrong with probing the technology, it is much better to run a competition that will give feedback that can be used to optimize the company’s short- and long-term future performance.

Far too many competitions either don’t make sense or produce results that are never used.

Even if the competition itself proves successful, who really has the time or resources to do fruitless work?

Konrad Budek and Patryk Miziuła, deepsense.ai

Source: deepsense.ai

https://deepsense.ai

ASTROMAN Magazine - 2018.04.10

Financial Times: CodiLime among 50 fastest growing companies in Europe

https://www.astroman.com.pl/index.php?mod=magazine&a=read&id=2480

ASTROMAN Magazine - 2018.03.23

deepsense.ai becomes NVIDIA Deep Learning Partner

https://www.astroman.com.pl/index.php?mod=magazine&a=read&id=2469

ASTROMAN Magazine - 2018.03.14

Research on reinforcement learning advancements powered by deepsense.ai granted with 500,000 USD

https://www.astroman.com.pl/index.php?mod=magazine&a=read&id=2458

ASTROMAN Magazine - 2018.03.01

CodiLime's Training & Development Hub to help companies develop unique skills in-house

https://www.astroman.com.pl/index.php?mod=magazine&a=read&id=2450

ASTROMAN Magazine - 2018.01.05

United Nations enlists deepsense.ai to build out its deep learning expertise

https://www.astroman.com.pl/index.php?mod=magazine&a=read&id=2406

ASTROMAN Magazine - 2017.10.25

deepsense.ai popularize machine learning at European universities as part of the Intel Nervana AI Academy

https://www.astroman.com.pl/index.php?mod=magazine&a=read&id=2367

ASTROMAN Magazine - 2017.09.28

Seahorse goes open source! Data analysts can get more from the free BI tool powered by Apache Spark

https://www.astroman.com.pl/index.php?mod=magazine&a=read&id=2345

ASTROMAN Magazine - 2017.09.26

CodiLime's cybersecurity team among five best in the world

https://www.astroman.com.pl/index.php?mod=magazine&a=read&id=2342

ASTROMAN Magazine - 2017.09.14

deepsense.ai's research on robotics at prestigious 1st Annual Conference on Robot Learning

https://www.astroman.com.pl/index.php?mod=magazine&a=read&id=2337

ASTROMAN Magazine - 2017.09.06

The new Neptune. Data scientists on a cloud with the new Machine Learning Lab from deepsense.ai

https://www.astroman.com.pl/index.php?mod=magazine&a=read&id=2331

ASTROMAN Magazine - 2017.08.25

deepsense.ai has become the AI World's Strategic Deep Learning Partner

https://www.astroman.com.pl/index.php?mod=magazine&a=read&id=2321

ASTROMAN Magazine - 2017.08.19

deepsense.ai: A new global player on the AI solution provider market

https://www.astroman.com.pl/index.php?mod=magazine&a=read&id=2316

ASTROMAN Magazine - 2017.06.06

CodiLime's CodiSec CTF team beats out more than 200 competitors from across the globe in Göteborg, Sweden

https://www.astroman.com.pl/index.php?mod=magazine&a=read&id=2270

ASTROMAN Magazine - 2017.05.23

Intel brings data science to Polish universities with an event series powered by deepsense.io

https://www.astroman.com.pl/index.php?mod=magazine&a=read&id=2254

ASTROMAN Magazine - 2017.05.19

GPU Technology Conference 2017 - deepsense.io shares first-look on image recognition for retail analytics using deep learning

https://www.astroman.com.pl/index.php?mod=magazine&a=read&id=2249

ASTROMAN Magazine - 2017.05.05

CyCon 2017 Starts in Four Weeks. Czech Team Wins Cyber Defence Exercise Locked Shields 2017

https://www.astroman.com.pl/index.php?mod=magazine&a=read&id=2239

ASTROMAN Magazine - 2017.03.24

deepsense.io helps improve the UK's satellite defence and security intelligence

https://www.astroman.com.pl/index.php?mod=magazine&a=read&id=2218

ASTROMAN Magazine - 2016.10.21

CodiLime is second fastest-growing tech firm in Central Europe - 2016 Deloitte Central Europe Technology Fast 50 list announced

https://www.astroman.com.pl/index.php?mod=magazine&a=read&id=2140

ASTROMAN Magazine - 2016.10.20

The Deloitte Technology Fast 50 in Central Europe: First place in the 2016 ranking goes to the Polish business Codewise, the second-placed CodiLime also from Poland

https://www.astroman.com.pl/index.php?mod=magazine&a=read&id=2139

ASTROMAN Magazine - 2016.05.17

Tomasz Kulakowski, CodiLime co-founder and CEO, wins Poland's top business award for Vision and Innovation

https://www.astroman.com.pl/index.php?mod=magazine&a=read&id=2066

ASTROMAN Magazine - 2016.04.07

deepsense.io experts to present Big Data and deep learning accomplishments at Silicon Valley and Dublin conferences

https://www.astroman.com.pl/index.php?mod=magazine&a=read&id=2051

ASTROMAN Magazine - 2015.10.31

CodiLime has won the Rising Stars Award and was ranked 2nd in Central Europe at the 2015 Deloitte Technology Fast 50 in Central Europe

https://www.astroman.com.pl/index.php?mod=magazine&a=read&id=1987

ASTROMAN Magazine - 2015.10.23

deepsense.io Data Scientist Wins AAIA'15 Data Mining Competition

https://www.astroman.com.pl/index.php?mod=magazine&a=read&id=1983

ASTROMAN Magazine - 2015.09.30

deepsense.io to feature Seahorse Community Edition on the Trusted Analytics Platform developed by Intel

https://www.astroman.com.pl/index.php?mod=magazine&a=read&id=1971

ASTROMAN Magazine - 2014.12.12

CodiLime CEO, Tomasz Kulakowski became the EY Entrepreneur Of The Year 2014 in the category New Business

https://www.astroman.com.pl/index.php?mod=magazine&a=read&id=1843