AI bias and Data Scientists’ responsibility to ensure fairness

As artificial intelligence creeps out of data labs and into the real world, we find ourselves in an era of AI-driven decision-making. Whether it’s an HR system helping us sort through hundreds of job applications or an algorithm that assesses the likelihood of a criminal becoming a recidivist, these applications are helping shape our future.   

AI-based systems are more accessible than ever before. And with its growing availability throughout industries, further questions arise surrounding fairness and how it is ensured throughout these systems. Understanding how to avoid and detect bias in AI models is a crucial research topic, and increasingly important as its presence continuously expands to new sectors. 

AI Systems are only as good as the data we put into them.”

IBM Research

AI builds upon the data it is fed. While AI can often be relied upon to improve human decision-making, it can also inadvertently accentuate and bolster human biases. What is AI bias? AI bias occurs when a model reflects implicit human prejudice against areas of race, gender, ideology and other characteristic biases.  

Google’s ‘Machine Learning and Human Bias’ video provides a tangible example of this idea. Picture a shoe. Your idea of a shoe may be very different from another person’s idea of a shoe (you might imagine a sports shoe whereas someone else might imagine a dressy shoe). Now imagine if you teach a computer to recognize a shoe, you might teach it your idea of a shoe, exposing it to your own bias. This is comparable to the danger of a single story.  

The single story creates stereotypes, and the problem with stereotypes is not that they are untrue, but that they are incomplete. They make one story become the only story.”

ChimamandaNgozi Adichie

So, what happens when we provide AI applications with data that is embedded with human biases? If our data is biased, our model will replicate those unfair judgements. 

Here we can see three examples of AI replicating human bias and prejudice:  

  • Hiring automation tools: AI is often used to support HR teams by analyzing job applications and some tools rate candidates through observing patterns in past successful applications. Where bias has appeared is when these automation tools have recommended male candidates over female, learning from the lack of female presence. 
     
  • Risk assessment algorithms: courts across America are using algorithms to assess the likelihood of a criminal re-offending. Researchers have pointed out the inaccuracy of some of these systems, finding biases against different races where black defendants were often predicted to be at a higher risk at re-offending then others.  
     
  • Online social chatbots: several social media chatbots built to learn language patterns, have been removed and discontinued after the posting of inappropriate comments. These chatbots, built using Natural Language Processing (NLP) and Machine Learning, learned from interactions with trolls and couldn’t filter through indecent language.   

The three scenarios above illustrate AI’s potential to be biased against groups of people. And the key underlining factor of these results is biased data. Although inadvertently, they did exactly what they were trained to do — they made sense of the data they were given.   

Data reflects social and historical processes and can easily operate to the disadvantage of certain groups. When trained with such data AI can reproduce, reinforce, and most likely exacerbate living biases. As we move into an era of AI-driven decision-making, it is more and more crucial to understand the biases that exist and take preventive measures to avoid discriminatory patterns. 

Understanding the types of biases, and how to detect them is crucial for ensuring equality. Google identifies three categories of biases:

  • Interaction bias: when systems learn biases from the users driving the interaction. For example, chatbots, when they are taught to recognize language patterns through continued interactions.  
  • Latent bias: When data contains implicit biases against race, sexuality, gender etc. For example, risk assessment algorithms which show examples of race discrimination. 
     
  • Selection bias: When the data you use to train the algorithm is over-represented by one population. For example, where men are over-represented in past job applications and the hiring automation tool learns from this.    

So how can we become more aware of these biases in data? In Machine Learning literature, ‘fairness’ is defined as “A practitioner guaranteeing fairness of a learned classifier, in the sense that it makes positive predications on different subgroups at certain rates.” Fairness can be defined in many ways, depending on the given problem. And identifying the criteria behind fairness requires social, political, cultural, historical and many other tradeoffs.  

Let’s look at understanding the fairness of defining a group to certain classifications. For example, is it fair to rate different groups loan eligibility even if they show different rates of payback? Or is it fair to give them loans comparable to their payback rates? Even a scenario like this, people might disagree as to what is fair or unfair. Understanding fairness is a challenge and even with a rigorous process in place, it’s impossible to guarantee. And, for that reason, it is imperative to measure bias and, consequently, fairness.   

Strategies of measuring bias are present across all society sectors, in cinema for example the Bechdel test assesses whether movies contain a gender bias. Similarly, in AI, means of measuring bias have started to arise. Aequitas, AI Fairness 360, Fairness Comparison and Fairness Measures, to name a few, are resources data scientists can leverage to analyze and guarantee fairness. Aequitas, for example, facilitates auditing for fairness, helping data scientists and policymakers make informed and more equitable decisions. Data scientists can use these resources to evaluate fairness and help make their predications more transparent.  

The Equity Evaluation Corpus (EEC) is a good example of a resource that allows Data Scientists to automatically assess fairness in an AI system. This dataset, which contains over 8,000 English sentences, was specifically crafted to tease out biases towards certain races and genders. The dataset was used to automatically assess 219 NLP systems for predicting sentiment and emotion intensity. And interestingly, they found more than 75% of the systems they analyzed were predicting higher intensity scores to a specific gender or race. 

As AI adoption increases rapidly across industries, there is a growing concern about fairness and how human biases and prejudices are incorporated into these applications. And as we’ve shown here, this is a crucial topic that is receiving more and more traction in both scientific Literature and across industries. And understanding the human biases that percolate into our AI systems is vital to ensuring positive change in the coming years.    

If you’re interested in learning more about fairness in AI, here are some other interesting references:

https://fairmlbook.org/ 
http://papers.nips.cc/paper/6374-equality-of-opportunity-in-supervised-learning.pdf
https://papers.nips.cc/paper/6316-satisfying-real-world-goals-with-dataset-constraints.pdf 

How AI can help to understand the customer

Ahead of us is a significant change in the way brands use customer experience (CX).  We are already starting to see the switch from companies competing on price and product to competing on CX. But what exactly do we mean by CX? Gartner defines CX as a customer’s perceptions and feelings caused by the one-off and cumulative effect of interactions with a supplier’s employees, systems, channels or products.   

Previously, the communication flow between customers and companies was either in person, writing or via a telephone call to the support line. Now, there are increasingly more ways customers can interact with brands, and when they do, they expect a high-quality experience “on demand.” 81% of marketing leaders were expected to mostly or completely compete based on customer experience by 2019, as revealed in the 2017 Gartner Customer Experience in Marketing Survey.  

There are many tools already giving insight to CX, such as NPS and Customer Success Scores. However, when companies need to make quick decisions, real-time insights are what’s helping decision makers. Technologies such as AI are now gathering these insights by allowing companies to organize and categorize data based on business needs, helping to make sense of all these interactions.  

To understand the customer from a CX perspective, and give some real-world examples, we can filter down a myriad of AI technologies and categorize them into three buckets: 

  • Speech Analytics: understanding, interpreting and analyzing voice conversations. Example: understand sentiment, IVR systems.
  • Image: capturing, processing and analyzing images, photos and video. Example: customer patterns, social media image analysis. 
  • Natural Language Processing: analyzing human expression and emotion. Example: text, chatbot, email analysis.  

The below table shows CX use cases and examples of these AI technologies in action:  

Source: Gartner 2019

Are data scientists the only ones needing to understand these technologies? No, it’s extremely valuable to both marketing and CX teams to gain an understanding of these tools. Every company has unique needs depending on CX goals and business objectives. Teams need to make a well-informed decision and understand which tools are most useful to their business, which will essentially lead to more accurate decision-making and a customer-first approach.     

Now, are people rushing to adopt these new AI technologies for CX? In Gartner´s 2018 Enterprise AI survey, it was revealed that businesses that are already deploying AI, 26% are implementing it to improve customer experience. Although it may not seem urgent to start implementing these technologies right away, it’s important that businesses are aware and start to familiarize themselves with these AI applications.  

A good place to start is mapping out a customer journey and finding the ‘dark spots’. These are the areas that could benefit from deeper real-time insights, such as understanding the mood of a customer when they are talking with a chatbot. Having these insights will allow you to hand over the conversation to a human based on the customer’s emotion.  

Companies are dealing with an increasing number of interactions happening across multiple channels and devices. With customer expectations are at an all-time high, it’s not easy to connect all these touch points and deliver an excellent customer. AI can help provide rich insights allowing you to get faster, real-time understandings, and optimize the overall customer journey. 

Recapping the week at MWC19

Mobile World Congress (MWC) 2019, the world’s largest exhibition for the mobile industry, welcomed leaders from mobile operators, device manufacturers, technology providers, vendors and more.  

This year’s event saw a focus on two core concepts: 5G and Artificial Intelligence. It was said to be one of the most important events in recent times for the mobile industry. In the days leading up to the show, a warm buzz of anticipation filled the air as attendees were eager to hear about the new groundbreaking technologies. We were excited to be surrounded by leaders in the field and pleased to be a co-exhibitor for the Washington State Delegation of Commerce. 

With a large number of keynote presentations, panel discussion and exhibitors, there were many outtakes from the event. A hot topic that continued to emerge was AI bias. On day two I was able to discuss this topic with other like-minded people: Elena Fersman (Ericsson), Beena Ammanath (HPE), Beth Smith (IBM) and Kriti Sharma (Sage), who are all working towards an unbiased future for AI.  

We discussed ‘Democratizing AI and Attacking Algorithmic Bias’. The discussion of bias in AI continued throughout the event as many people came to speak with us about how to overcome this problem. If you missed this talk and want to hear more, see an edited version here.  

We also attended the Applied AI Forum: an exclusive conference that brought together telecom leaders, AI specialists, start-ups and academics, with an aim to spur debate and discussion on the practice of AI across the digital economy. Google and IBM Watson held an interesting panel discussion that explored ‘Applied AI: new trends and strategies’. In this forum, we were able to share lessons learned and discuss recent breakthroughs with both data scientists and global leads from several large enterprise companies.  

Another key highlight of MWC was our exciting hiring announcement! On the second day, we released our plans for the year: to double the size of our company by the end of 2019. With the rapid growth of AI applications seen across all industries, there is an increasing demand for high-quality data. And with this, our company is growing faster than ever. We are looking for more talent to join our team in Portugal, Japan, and the United States. See our careers page for more information.  

What a big week it was as we move into a new era of Intelligent Connectivity. A huge thanks to GSMA, a body representing the interests of mobile networks globally, and everyone we met at the event.  

We’d love to continue the discussions we had, especially around the topic on bias in AI. Reach out at pr@definedcrowd.com, we’d be glad to hear from you. We are already thinking about what next year might hold.   

Job description: talent for a smarter AI

We’re in a huge growth stage and are looking for talent to join our global team in Portugal, Japan, and the United States. Check out our careers page for current openings.

We accelerate the evolution of Artificial Intelligence initiatives by delivering high-quality training data to enterprise companies. We are investing heavily in our business and the people to make this happen. Over the coming months, we’ll be searching for professionals who are looking to embark on an exciting career while making a difference in AI.  

So, who are we? We’re an 80-employee startup based in Seattle, with offices in Lisbon, Porto, and Tokyo. Our CEO, Daniela Braga founded DefinedCrowd in 2015 to fill a gap in the market by offering high-quality training data to help machine learning products reach the market at optimal quality and speed. And with the rapid growth of AI applications and the high demand for this data, our business is growing so quickly that we are looking to nearly double our team by the end of 2019.

“We have a very ambitious goal –  to be the number one provider of data for AI in the world. This year will be crucial to achieving this goal, as we mature our product, grow our client base, and increase our partnerships with the companies that are leading the AI revolution.”

Founder and CEO, Dr. Daniela Braga

We are currently hiring positions in the following departments: Development, Product, Marketing, and Operations, with several openings for Software Developers (Frontend, Backend, and Full Stack), QA Automation Engineers, and Machine Learning Engineers. These positions are available within our four offices and will have an important role in the expansion of the company’s product: an all-in-one data platform.  

Earlier this year, DefinedCrowd was selected as one of CB Insights top 100 AI startups. Our client list includes many Fortune 500 companies including BMW, Mastercard, Nuance, and Yahoo Japan. We also have partnerships with IBM, Microsoft, and Amazon.  

“We are looking for the best talent to join our team in this exciting moment, and to be part of the construction of a smarter AI”

Founder and CEO, Dr. Daniela Braga

For anyone interested, make sure you keep an eye out on our careers page http://careers.definedcrowd.com, where there will be new jobs added throughout the year.  

100 most promising AI startups

日本語版はこちら

We’ve come a long way since forming in 2015. Starting out as a small team, we now have four offices worldwide – Lisbon, Porto, Tokyo, and Seattle – and continue to grow every day.  

Our unique platform has helped many successful companies feed their artificial intelligence applications with training data. Using human intelligence coupled with machine-learning, we deliver project-specific, quality-guaranteed data.    

Today, we’re proud to announce that DefinedCrowd is among CB Insights’ third annual list of 100 AI startups. A research team from CB Insights selected 100 startups based on the following factors: investor profile, market potential, partnerships, competitive landscape, and team strength. 

Source: CB Insights

Companies are categorized by focus area. These focus areas aren’t mutually exclusive and include core sectors such as telecommunications, government, retail, healthcare and enterprise tech sectors such as training data (where we sit), software development, data management, and cybersecurity. 

We are pleased to be among this group of incredible AI startups, selected from an extensive list of 3k+ AI companies, and look forward to seeing these companies grow.  

It´s been a great start to 2019. And, we´re very thankful to everyone who has helped get us here.  

AI分野における最も有望なスタートアップ企業 トップ100社

English version available here

DefinedCrowd社の”AI向け学習データプラットフォーム”は、ヒューマンインテリジェンスとマシンラーニングを組み合わせたワークフローにより、、AIアプリケーションの開発・改善に必要な”学習データ”を、お客様毎に、更には、個別のプロジェクト毎に最適化された形でご提供しています。

この度、DefinedCrowd社は、米調査会社CB Insightsが発表した、AI分野における最も有望なスタートアップ企業 トップ100社の中の1社に選ばれました。       

Source: CB Insights

この発表は今年で3回目となり、今回は総数3,000社以上のAI関連のスタートアップ企業の中から、投資家のプロフィール、市場ポテンシャル、パートナーシップ、競争環境、その企業の強みなど、複数の要素を加味・評価した上で、トップ100社が選ばれています。

これらの”AIスタートアップ企業 トップ100社”は、その注力する業界や技術分野毎にカテゴリー分けされており、DefinedCrowd社は”エンタープライズ テクノロジー”の一角、「トレーニングデータ部門」の1社として選ばれました。私たちのデータプラットフォームは、多くの活躍している企業の人工知能アプリケーションに必要なトレーニングデータを、ヒューマンインテリジェンスと機械学習を組み合わせ、プロジェクト固有の高品質データを提供しています。

Daniela Braga’s journey with DefinedCrowd

Earlier this week, DefinedCrowd was Featured in Jornal Económico, a premium financial publication in Portugal. We’ve translated the article from the original Portuguese for our English- speaking friends. Enjoy!

Original Article by
António Sarmento

Founded in Seattle, USA, DefinedCrowd is a startup specializing in training data for Artificial Intelligence. The company counts Amazon, IBM, and EDP as investors and clients. 

DefinedCrowd provides services so that data scientists can gather, structure, and enrich datasets for Artificial Intelligence, helping companies improve speed to market and the overall quality of their AI products. DefinedCrowd accelerates enterprise AI initiatives by combining machine learning technology with human-in-the-loop collection processes. Founded in August 2015 by entrepreneur Daniela Braga, the company is headquartered in Seattle, has R&D centers in Lisbon and Porto, and a sales office in Tokyo. 

Three months after its founding, the company opened their first R&D office at Startup Lisbon. Since then, DefinedCrowd has blossomed from an initial team of three employees to a workforce of more than 70 that is still growing.

In 2016, the company raised $ 1.1 million in seed funding, with investors such as Sony, Amazon Alexa Fund, Portugal Ventures, and Busy Angels.
In July 2018, DefinedCrowd closed a Series A funding worth $11.8 million, led by Evolution Equity Partners. EDP Ventures, Mastercard and Kibo Ventures joined as new investors, while Sony, Amazon, Portugal Ventures and Busy Angels bolstered their investments with additional capital for the data company.

“It is important to raise capital if we want to move fast, especially in the technological sector.”   

Daniela Braga to Jornal Económico

This influx of capital is being used to accelerate product development and accelerate team growth. Two-thirds of DefinedCrowd’s 70 employees work out of Portugal. The company expects to add 80 more team members by the end of 2019.

Over the past six months, DefinedCrowd has announced three partnerships: a formal designation as an Amazon Alexa Skills partner, a product integration with IBM Watson Studio; and participation as a featured vendor in Microsoft‘s co-sell program.

DefinedCrowd’s platform provides industry-agnostic data services and can support text, audio, and image annotation. The company’s clients span industries as a result: from Fintech, to Retail, Healthcare, Utilities, and the Internet of Things. Their client portfolio consists mostly of Fortune 500 companies, including BMW, MasterCard, EDP, José de Mello Saúde, SoftBank, Yahoo Japan, Randstad, and Nuance

DefinedCrowd’s goals are ambitious. The company aims to become the world’s number one AI data provider through expanding their client-base and forging new partnerships with industry leaders. 

With a degree in Portuguese Language and Literature, Daniela Braga has spent her career examining the rigorous use of language, the perfect foundation for her business. “We deal daily with data in 70 languages and dialects. Our clients need, at a minimum, native-level speakers and sometimes even require linguists or specialists in language sciences for all of them” says the entrepreneur.

After graduating with a master’s degree in applied linguistics, she went on to earn a PhD in Speech Technologies at the Faculty of Engineering at the University of Porto and taught at the University of A Coruña for two years before joining Microsoft (whom she worked for in Portugal, China and the United States).

After leaving Microsoft in 2013, Daniela moved to American company Voicebox. Simultaneously, she was invited to teach Data and Crowdsourcing for Speech Technologies at the University of Washington. It was during this time that she saw the gap between the Artificial Intelligence data scientists wanted to develop and the training data available to build it. She decided to found her own company as a result.

Waving a well-paid job goodbye, and with few personal resources, she started meeting with investors in Seattle, and quickly received an initial check: $ 200,000 in financing to start her business. A business that is now signing contracts with some of the largest companies in the world.

DefinedCrowd is in constant growth and employee numbers have been updated to reflect our current position.