A chatbot to help humans have healthier lives through conversation.

Status: Deprecated

In 2016 I launched bots4health, a personal project I used to learn about conversational interfaces. This side project was used by over 10.000 unique users. Bots4health used Chatfuel and Dialogflow to facilitate a conversational experience in Messenger. Eva’s goal was to help users have healthier lives through conversations.


The general public does not have access to health information they can digest and they do not have a way to share relevant public health data.

Technology has a great potential to directly connect health institutions with the populations they serve, but current solutions are not easy to use for the general public or, what’s more concerning, a large part of the healthcare workforce.


Conversational interfaces reach humans where they are, allow for a natural interaction based in words and sentences instead of buttons and forms, and establishes a 1:1 communication channel that stays open over time, allowing for personal, contextual attention. In combination with data strategies, a conversational agent could be a game changer for epi data collection, health messaging, surveying, and other public health activities.


  • Delivering health information to users in a more natural way.
  • Validating the adequacy of conversational interfaces in healthcare.
  • Creating usable conversational interfaces in a pre-standards context.


  • Validate conversational interfaces in healthcare.
  • Do user research for conversational UX.
  • Understand a new interface that is going to change the way humans relate to technology forever.
  • Define workflows and develop a conversational methodology.


I designed and developed a facebook messenger chatbot named Eva and trained her in different features:

  • Setting up and receiving reminders, which is helpful for people who need to do exercises for rehab or patients with chronic diseases who need to build a habit to take their medication.
  • Collecting patient background information, specifically around period pain as Eva was at first a women health chatbot.

Eva has been chatting with users since September 2016, which has allowed me to develop different experiences, to collect a large amount of data as well as to improve the conversation design and the Natural Language Processing training data.

In numbers: Healthy Goals Weekly Reminders

We often use our Facebook Messenger Chatbot Eva to try out different conversational strategies. These are the results of a particular experiment, the Healthy Goals Weekly Reminders.

Eva was first launched as a Spanish speaking chatbot in Facebook Messenger in September 2016. She was a women’s health chatbot. In a second version, Eva collected Mood Bits and was based on the fundamentals of journal therapy. In January 2019 we launched a new experience in our chatbot. This time we thought we could take advantage of the calendar and build a chatbot that supported healthy new year goals. We wanted to validate the hypothesis of chatbots being good tools to maintain healthy habits and to test Facebook Ads.

The users for this experiment came from Facebook ads and existing users that responded to a notification inviting them to join this new experience. 1234 users had conversations with the chatbot while this flow was active.

A number of these users were long term users of bots4health. For example, 36 of these users had been active users of the Mood Bit feature in 2017.

Mood Bits where daily reflections, a way to help users be more conscious about their habits and how they impacted their moods.

I implemented some changes in the bot for this experience: For example, I removed all other functionality (posts about bots, mood bits…) and redesigned the Onboarding of the chatbot.

Then, I added a sequence to send a weekly message for 6 weeks to measure engagement and results.

All of the intents in the NLP were left in the bot to answer questions in natural language, although the model was not trained for this specific sequence and most expressions understood were still medical references, questions about the bot, and unrelated inputs.

The new onboarding was built around the objective of defining healthy habits users wanted to stick to in 2019. 

On their first conversation, users defined a health-related goal. They could select one from our list or type text. The following goals were suggested:

  1. Healthier eating habits
  2. Practicing more sports
  3. Meditating
  4. Quitting smoking.

Nutrition and quit smoking goals had specific flows.

If their free text was related to the suggested goals we could relabel the user attributes. Otherwise, users would continue the flow without a goal and just receive the general messages and none related to specific goals.

422 users defined a goal. Out of 1235 people this represents 34% of users. 193 selected Healthier eating habits, which is 45% of the total number of users who indicated a goal and 16% of the total users during the period.

One week after their first conversation, the chatbot sent them a message asking them about their success in achieving their goal and gave them a motivational quote.

Users could unsubscribe at any time. The chart below shows the number of users that stayed subscribed to the goal progress notification through the 6 weeks in which we run the experiment.

40% of the users stayed with the chatbot after 6 weeks.

During this period they received notifications from the chatbot with questions that defined user attributes.

The open rate for the initial message was close to 80% and it decreased over time with a minimum of 53% in week 5. Click rate was under 10% every week. The clickable buttons were Yes and No quick answers for the question “Do you feel you are closer to achieving your goal”.

The chatbot performed significantly well when compared to email newsletters. According to Mailchimp, the average open rate for health-related newsletters in 2018 was at 20%, and the click rate at 2.18%.

The chatbot is about 3 times as good at getting users’s attention as the traditional newsletters.

The chatbot included links to external sources. They were clicked 129 times, achieving a click rate of 10.45% of all users who spoke to the bot in the period while the experiment was live.

Below is a summary of clicks per link. The most popular link is, followed by links to recipes that are part of the nutrition specific conversational flow. A total of 80 clicks were done in Recipes, representing 62% of all clicks in external sources.

One interesting result is for a specific recipe we presented to users as a link to a post and a link to a video. The post got 19 clicks and the video got 16 clicks. The difference between both results is too small to be definitive, we recommend testing the performance of video and posts again.

There were 3 clicks on an invite to share the chatbot with their contacts and one click in an invite to check out another chatbot, Izzy. 

User Attributes

There is a correlation between the number of sessions and the number of attributes set up by users. The more conversations the user has with the chatbot the more data points are known about them.

There are some outlying values that may belong to long term users of the chatbot that joined the new experience, or to people that proactively chat between or after sessions.

We looked at the variations in engagement for the three most popular goals. Below is a chart with the number of users who responded to the weekly messages each week.

GoalUsersWeek 1Week 2Week 3Week 4Week 5Week 6
Do more sports104473515151812
Quit Smoking47221210825
Number of Users that responded to weekly goal messages per goal and week

We translated these figures into percentages of the total number of users who selected each of the goals. When the content stays the same, the engagement trend is generally decreasing over time.

The highest value is 47% for the first week for users who selected the goal “Quit Smoking” and the lowest is 4% for the same group in week 5.

GoalWeek 1Week 2Week 3Week 4Week 5Week 6
Do more sports45%34%14%14%17%12%
Quit Smoking47%26%21%17%4%11%
Percentage of Users that responded to weekly goal messages per goal and week

When we put these percentages in a chart we can see the trend is descending but there is an interesting bump in week 3 that is delayed to week 5 for the users who indicated they wanted to quit smoking. 

It’s also interesting to see that all users end with a 10% engagement rate on week 6. It would be interesting to run longer experiments to see how the trends develop.

Natural Language Processing

Of the 422 users who indicated a goal, 226 users (53%) attempted to converse in natural language. This validates the assumption:

Chatbots may have navigations that rely heavily on buttons, but NLP is necessary to satisfy the preferences of most users.

All of the messages were responded by the chatbot, but it is not possible to know how many of the answers were satisfactory for this specific cohort.

Below is an extraction from a real conversation. The user said “I have 1 problem” and the system returned the answer programed for the intent health.UTI. This is a false positive.

Conversation with a user

This expression is considered responded to the system, and the match is added to the training interface for a human to review.

When we looked at the intent setup in the NLP provider, Dialogflow, we did not see this specific expression in the list of matched expressions for the intent.

Dialogflow intent training phrases

Some popular expressions, like “I have a headache” are usually well handled by the bot. The answer to health related questions is always modest and we often recommend users to check with their doctors for diagnose and treatment.

Some conversations include typos made on purpose or without intention. These conversations are redirected to the default answer with a suggestion to say set goals. 

In a sample of 1000 expressions collected by Chatfuel, we identified 812 unique spellings. When ignoring differences in capitalization, we counted a total of 791 unique expressions.

The free text input is a mix of different kinds of input:

  • Questions, made in natural language. There are 145 question marks in the data set.  We can assume 18% of the inputs were questions written in natural language.
  • Questions, written as a search query. 
  • Statements and commands.
  • Words
  • Numbers
  • Emoji. 17 inputs were emoji-only. They represent 2% of all inputs.

Sentiment Analysis

We used a 3rd party app to analyze sentiment of the natural language input. The results were not accurate so no further investigation has been done in this direction.

New Intents

More than 40 new intents were identified. Training has not been completed for all of them. 

New intents are generated for expressions that fall under the following categories:

  1. Primary: They are related to the main goal of the conversation (e.g. “I need to add a food restriction”, “What is a good example of a healthy breakfast”)
  2. Secondary: They are related to the behind the scenes of the conversation (e.g. “what medical basis is there to this meal plan?”)
  3. Navigation: They are attempts of users to navigate the conversation using natural language (e.g.””)
  4. Turing: They are experiments to test the intelligence of the chatbot (e.g. “How do you start loving yourself when yiu hate yourself?”)

What does Eva do?

Eva helps users keep track of their progress in their new year goals. Each of the three available options has different sequences and dynamics, with “Eat Better” being the one we have dedicated the most love to.

Users who indicate they want to improve how they eat receive the weekly reminders (“How are you doing in reference to your goal?”) but also receive meal recommendations every few days, and a monthly message with seasonal fruits and vegetables and a recipe.

Besides this, Eva can keep some sort of conversation with people that are curious about automated conversations and with trolls. Eva is far from being as good at small talk as chatbots like Mitsuku, but she’s doing her best and she keeps learning new things every week. Eva has been programmed to answer to things like “give me recipe ideas” or “I love tacos” or “what is the meaning of life?”, and her small talk skills improve every day.

Food for thought: Each one of these answers is written by a human based on the expressions sent by users. More on the training process below!

Eva is a great example of changing our mind, pivoting, and iterating continuously. In the past two years, Eva has been a pill reminder bot, a women’s health bot, and a health bots newsletter. When we started we knew what we thought users wanted. Then we faced real users and our minds were changed over and over again.

What is Eva made of?

Eva lives in Facebook Messenger and is made with Chatfuel and it uses the Dialogflow NLP API to respond to natural language expressions sent by users. In this article, we will use the term “Expressions” instead of “Natural Language Expressions” because we assume users don’t come to Eva with Binary Code and we treat all programmatic expressions as strings Eva will ignore. Except for this one:

Conversation Flow

The main conversation is designed on Chatfuel. It consists of an Onboarding flow where users are introduced to the chatbot and asked about their health goals, and two different sets of notification sequences. The first set of notifications is sent to users that are interested in eating healthier every 3 days and comes with recipes, and the second sequence is sent every 7 days and asks all users that have set a goal if they feel they have made progress towards their health goal, and sends them a motivational quote.

Processing Keywords

A lot of the user inputs are not processed by the chatbot because at certain steps, like the initial onboarding, we want to focus on the happy path and avoid distractions. For example, when we ask users for feedback about their meals, we just save their answer and proceed to say thank you, give them their quote, and ask them if they have any question. This was a design decision. In other cases, like whenever Eva tells users she can answer their questions, we want to make sure users can start a conversation.

We have configured some keywords in the so-called “Chatfuel AI” so that they are automatically sent to a block of the conversation, and any expressions that we want to recognize and can’t be matched are sent to the Default Answer Block. In this block, we have configured an integration with a Dialogflow agent. The expressions are sent to Dialogflow, matched to an Intent, and returned to Chatfuel so that they are displayed to users.

Expressions and Intents

Intents are the different concepts Eva can understand. We currently have around 120 Intents in Dialogflow, and the list keeps growing as users have conversations with Eva and we discover more topics that are interesting to users.

Each Intent is related to many Expressions from users, over 1500! Each Intent also has a series of Answers that the chatbot will say in response. We use a naming convention for Intents to make the task of keeping them up to date easier, but we are always iterating in the way we manage this.

The current convention groups intents in families. Our naming convention uses a dot to separate the different sub-families. Some of the most relevant families are health (for health-related Intents), nav (for navigation Intents), and meta (for Intents about Eva, chatbots, and the meaning of life). We can match these and the rest of the intent families to the three different levels of the Conversational Design Pyramid Model.

A lot of you are a lot more comfortable using a text editor, so here’s good news for you: you can totally manage this in your text editor. This is how this intent looks like in code:

 “id”: “70213998–4eb4–4453–9515–10c6f502234b”,
 “name”: “nav.meta.areyouachatbot”,
 “auto”: true,
 “contexts”: [],
 “responses”: [
 “resetContexts”: false,
 “affectedContexts”: [],
 “parameters”: [],
 “messages”: [
 “type”: 0,
 “lang”: “en”,
 “speech”: [
 “Iu0027m afraid I canu0027t answer that question, Dave…”,
 “Not at all. Iu0027m here looking for Sarah Connor.”
 “defaultResponsePlatforms”: {},
 “speech”: []
 “priority”: 500000,
 “webhookUsed”: false,
 “webhookForSlotFilling”: false,
 “lastUpdate”: 1537487696,
 “fallbackIntent”: false,
 “events”: []


Eva is trained to recognize certain words as items of an Entity. These Entities can be passed upon fulfillment to execute actions in a 3rd party service or to answer the user using that word. The different lists of entities grow over time as we identify new needs.

Named Entity Recognition (NER) is a fundamental area of work in Natural Language Processing. Through this task, a computer can understand the topic of a text or a sentence, identify the actions the user is asking it to do, and more. It does it in two distinct activities: first, it identifies the names in the sentence (which can be single words like “leg” or multiple words like “digestive system”) and then it classifies them based on an ontology pre-loaded in the system (eg. our existing Entities). NER is still far from perfect, it relies on manually tagged data and there are painful challenges, like disambuguation (eg. different Entities represented by the same words), that still need to be resolved.

We manage our Entities in Dialogflow, where we have configured most Entities for Automated Expansion so that we can take advantage of the Dialogflow superpowers. Thanks to this configuration, Dialogflow will apply its magic (mostly statistics) to identify words that are similar to the ones we added in the Entities definitions, and add them to the collection so that it keeps working with as little manual maintenance on our side as possible.

This is how each Entity looks like if you manage them from a text editor:

    "value": "Column",
    "synonyms": [
      "Dorsal Spine"


One of the main drivers for Eva has been learning how users interact with chatbots and how we can use conversation design to drive empathy and ultimately, engagement. Paying attention to how users feel during conversations has been at the top of our priorities since day one.

Over time we have identified different positive and negative expressions from users that are related to sentiments. We have been created Intents for each of these Expressions to respond with something that makes sense: we apologize to users that are disappointed and respond with love to users that send Eva love. Eva can respond to laughs: she may ask you to share the chatbot link with your friends now that you are happy, or just respond with another laugh or an emoji. We have programmed her with different answers to each of these sentiment intents so that she is always somehow surprising for users who are having a more personal conversation with her.

In the future, we would love to have an excuse to add Sentiment Analysis APIs to Eva. If the conversation volume justifies the internal development cost this is something that could be very interesting to do. For now, Eva’s pre-programmed answers are working just well.

Training Eva

When we identify expressions that don’t match an existing Intent we need to decide if we want to redirect users to the Default Answer or to another Intent, or if we want to create a new Intent to satisfy future similar expressions.

Intents created from the Training interface only have a name and the origin Expression, so we add “aa” at the beginning of the Intent name so that it is positioned at the top of the Intent list. This way, after we review all unmatched Expressions in Training, we can go to the Intents list and start the work at the top of the list.

Training a chatbot for NLP means dealing with these 10 common challenges. Some of them can be easily addressed once we are aware of their existence (abbreviations, for example, can be easily addressed at the beginning of a project with a little domain and user persona research) while others like context and feature discovery will continue to be a challenge for a while longer.


We run analytics in Chatfuel and in Dialgoflow. Each tool provides different information, and we do some data processing on Google Sheets for the indicators neither of the platforms provides us with. These indicators are usually the closest to the users, so we put a big deal of attention in this manual analytics. Mind the italics: We take advantage of templates and the great Explore feature in Google Sheets, so a lot of the work is actually automated or somehow computer assisted.

Dialogflow analytics are restricted to the expressions that can’t be resolved by the Chatfuel AI, so we use them specifically to evaluate the NLP performance. A bit of information we love is the flow view that shows how users move from one Intent to another and that we use as a measure of improvement of our natural language understanding. Don’t take this too seriously, though: users will always think of new, fun things to ask to your conversational interface.

Chatfuel gives us more general analytics and is actually very useful when we need to understand how things are going. Analytics we extract from this tool include the total number of users, the new and blocked users per day, the active users, divided by type of interaction, and the number of visits to each block of the conversation.

We pay the pro version of Chatfuel for two reasons: First of all, the PRO version lets us remove the branded message at the beginning of our conversation. In the second place, but perhaps more important, PRO gives us access to the People view.

The People view lists all users of your bot and their system and custom attributes. You can filter users and save segmented views for easy reference, and you can download all the data as a .csv file to run a manual analysis.

Once you reach this point, the sky is the limit! With all your attributes in .csv format and the right tools (I simply use the Explore feature of Google Sheets), you can run your own analysis and print progress charts.

A medical paper called “Improving Access to Online Health Information With Conversational Agents: A Randomised Controlled Experiment“ has relevant conclusions. 

It was written by a group of scientists who studied a group of people who were asked to look for medical trials about specific diseases using a traditional keyword-based search engine and a conversational interface. They were then asked to do the same, making it a bit harder by adding criteria the trial had to match.

The sample for their study was 89 people of an average of 60 years old (very interesting!) with a very good 50/50 gender balance and 23 people with low health literacy. A fifth of them had previous experience looking for clinical trials.

The main finding was that participants were definitely more satisfied using the conversational interface.

“Results indicated that all participants were more satisfied with the conversational interface […] compared to the conventional Web form-based interface.”

Must be noted that looking at the description of the conversational interface it was an advanced one and probably a pleasure to use, with features like read-out-loud, bookmarking and different levels of detail.

The paper also mentions a previous study in which the success rate was also higher in the conversational interface for those users with low health literacy, and gives a figure that may surprise some: 36% of USA adults are included in that definition.

“[Another study] demonstrated that individuals with low health literacy had lower success rates when using these interfaces to search for general health information on the Web. Usability by people with low health literacy is important because this population comprises 36% of US adults.”

Interestingly enough none of these low health literacy users managed to find the clinical trial with the constraints using the keywords search, but a third did using the conversational interface.

“In our standardised task (task 2), it is notable that none of the low health literacy participants were able to find a correct clinical trial using the conventional search engine interface, whereas 36% (5/14) were able to do so with the conversational agent.”

What I read here is: maybe because you understand what your problem is you are able to find the best solution on google. For a person with not so good of a health literacy, that may be a way more challenging task. As in non of the low health literacy passed.

On the down side the time it takes is a bit longer in the conversational interface (around 30%) but participants were not bother and the time difference was actually subjectively perceived as shorter.

The final bits are encouraging:

Apparently several studies have shown that traditional keyword search does simply not work for kids, the elder, or people who speak a different native language.

Another study worked with conversation interfaces and proved their success making health easier to understand for those not familiar with it in health areas such as physical activity promotion, hospital discharge instruction, explanation of medical documents, and family health history-taking.

“Our findings suggest that conversational agent-based search engine interfaces could be a good alternative to conventional Web form-based interfaces for many kinds of applications, but especially for those intended for low health literacy users or those with limited computer experience or skills.”

Let’s talk.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s