How we support the “4–5× more honest responses” claim
If you are comparing survey tools, bold marketing should meet plain English: what is being measured, how the test is run, and how you can reproduce it on your own traffic. This page is that explanation.
1. What “honesty” means in practice
“Honesty” is not a single laboratory measurement. In organizational and customer feedback, researchers and operators use observable proxies for candid input, including:
- Participation among people who could respond. When people trust the channel, more of them complete the pulse instead of bouncing after opening a link or scanning a QR code.
- Comment depth. Short, empty praise (“great!”) is easy; specific concerns usually show up as longer, concrete open-ended replies.
- Reduced “ceiling stacking.” When every score is a 9 or 10 but comments are empty, that pattern often indicates social desirability—not proof that nothing is wrong. We look at the combination of scores and text, not scores alone.
Wellness Pulse is designed to increase trust (no accounts, no cookies, no vendor login wall on the respondent side), which is why we expect those proxies to move—not because we “score honesty” as a hidden variable.
2. Fair side-by-side test (what a skeptical owner should ask for)
A defensible comparison matches audience, moment, and effort as closely as possible. A standard pilot looks like this:
- Same intent, parallel instruments. Use the same number of rating questions and the same open-ended prompt (or very close equivalents). Minor wording differences are fine if both forms are pre-tested for clarity.
- Same promotion window. Run both tools over the same calendar stretch (for example two full weeks), with the same number of reminders and the same placement (counter cards, email footers, posters).
- Split or rotate traffic fairly. Either split locations/teams 50/50, or alternate weeks by segment, so one cohort is not always tired or busier. Document which group saw which tool.
- Count “eligible” the same way. Typical choices: unique QR opens / link opens that load the first question (starts), or physical impressions if you control distribution tightly. Pick one rule up front and stick to it.
- Exclude junk. Drop obvious test submissions from staff and duplicate device hits if your tooling exposes them.
That design answers the objection, “You just sent more reminders for Wellness Pulse.” If reminders and placements are matched, the comparison is about the channel, not hustle.
3. What we report after the pilot
We focus on metrics you can export and verify:
- Completion rate = completed responses ÷ eligible starts (or your pre-defined denominator).
- Substantive comment rate = share of completed responses whose open text meets a transparent rule (for example at least 25 characters after trimming, excluding single-word fluff like “good” or “none”).
- Optional: simple distribution notes (range and variance on ratings) to spot ceiling effects—interpreted cautiously and alongside comments.
The headline “4–5×” on our homepage refers to the ratio of those completion rates (and often comment substance) in pilots where legacy tools underperform because respondents do not trust the link or the login surface. Your ratio depends on your audience and execution; that is why we put a money-back challenge next to the claim on the homepage—so you can run the pilot and judge with your own numbers.
4. Illustrative worked example (not one customer; not a forecast)
The table below is a rounded, synthetic example to show how we calculate the ratio. It is not a guarantee of your outcome and not attributed to a single named organization. Real pilots vary by industry, season, and how tightly you match promotion.
| Metric | Legacy link tool (e.g. SurveyMonkey / Google Forms) | Wellness Pulse | Ratio (Pulse ÷ legacy) |
|---|---|---|---|
| Eligible starts (unique sessions that loaded Q1) | 820 | 795 | — |
| Completed responses | 41 | 198 | — |
| Completion rate | 5.0% | 24.9% | ≈5.0× |
| Substantive open text (≥25 chars, rule-based filter) | 9 of 41 (22%) | 102 of 198 (52%) | ≈2.4× comment rate |
In this worked example, “4–5×” comes from the completion-rate ratio. Comment depth also rises, but not always by the same factor—both numbers matter for operations.
5. Limitations (we will not hand-wave these)
- Denominator choice changes the ratio. If legacy tools count every email impression but Pulse counts only QR scans, the comparison is unfair. Pick denominators together before data collection.
- Seasonality. Holiday weeks, exams, or layoff rumors move response behavior. Match windows or run A/B by location.
- Comment rules are imperfect. A long rant can be unhelpful; a short line can be gold. Rules are for consistency across tools, not for perfect semantic judgment.
6. Where to see qualitative proof
For narrative outcomes (retention, stars, operational fixes), see written success stories on the homepage and industry case studies. Those describe what changed after teams heard more signal; this page describes how we count signal in a head-to-head pilot.
Disclaimer: This methodology page describes a recommended evaluation design. Features, guarantees, and commercial terms may change; refer to your agreement at signup for refund or challenge eligibility. If something here should be clearer, tell us and we will update it.