How Good is that App? Evaluation Criteria for Military Mental Health Providers

By Dr. Nigel Bush, Ph.D.
March 18, 2019

How often has a patient shown you a health-related smartphone app and asked your opinion? Or a colleague recommends that you try an app they've incorporated into their practice? How do you judge the merits of a particular app among the plethora of options in the market place?

These days, health-related smartphone apps are everywhere. It's estimated that at least half of all smartphone users have downloaded one of the 300,000 health apps available. With more than 10,000 apps dedicated to mental health alone, clinicians who fail to keep up with the mobile tools used by their patients will clearly be left behind.

In an ideal world, any app considered for use or recommendation by a clinician would be held to the same level of scientific scrutiny as prescription medications or psychotherapeutic interventions. Conventional methods of clinical testing, however, take a long time – often 3-5 years – while mobile apps can be built in months. The speed of innovation and redundancy makes apps obsolete too quickly for rigorous testing.

In the absence of an evidence base, clinicians need some guidance in assessing app quality. Here, I'll suggest some basic criteria that providers should consider when judging the merits of a smartphone tool. Before adopting such a tool or implementing it into practice, the clinician should make every effort to verify that the app satisfies as many of the following criteria, in ascending level of rigor, as possible:

Fundamental criteria

1. The app should have been tested throughout development by representatives of the actual audience who would be using it, to make sure it is usable, easily understood and navigated, functions well and reliably, and is satisfying. For example, if the app is being designed for use by active duty enlisted soldiers, it should be tested during development by active duty enlisted soldiers and changes should be made, if necessary, based on their input. This kind of testing is often known as usability, user experience, or human factors testing and should be a fundamental component of any app development.

2. Once developed but before being released, the app should be pilot tested in its intended environment (e.g. clinic, office, general public environment) with its intended audience, and be shown to be feasible for use in that environment and to be fully acceptable by the intended users. If the app is intended for use by mental health providers within an Army military treatment facility (MTF) it should be pilot tested by mental health providers within an Army MTF. It seems obvious, but it is surprising how often this step is either skipped or generalized to a more convenient location and audience. Feasibility/acceptability piloting should be a fundamental aspect of app development.

3. The app's content should be maintained and regularly updated with new version releases to reflect the current state of knowledge.

Required but scientifically sub-optimal criterion

4. The app may not have undergone full, formal clinical testing (see 5 and 6 below), but, at the very least, it should be based on solid, well-established scientific evidence. This means that any intervention delivered by the app should previously have been found to be effective, and that any assessment tool that users might complete on the app should previously have been demonstrated to be reliable and valid. In essence, the clinician is making a leap of faith that prior evidence from scientific testing in other contexts is already so strong, robust, and generalizable that a new mode of delivery (in this case via smartphone app) does not need to be tested.

Ideal criteria

5. The app will have undergone a full, formally conducted, randomized controlled published trial or quasi-experimental study showing efficacy/effectiveness in a target environment with a target audience. A prime example is the Virtual Hope Box app for emotional regulation which also satisfies criteria 1–4.

6. The app will have undergone at least two randomized controlled published trials or quasi-experimental studies replicating findings that showed efficacy/effectiveness in the target environment with the target audience. For example, the PTSD Coach app meets all six criteria.

Where can clinicians find the information to assess whether an app meets these criteria? First and foremost, they should always search the conventional scientific literature for scholarly publications. PubMed is a good place to start. The iPhone and Android app download stores and the internet in general may also prove useful for finding ratings, reviews, and testimonials from clinicians and users, data on usage, and reports not published in the academic literature. Text in the app itself is another place to look – check menus and "About" pages for indications of how current the content is.

The reality is that most mental health apps will only satisfy criteria 1–4. When the assessment or intervention to be delivered is simple, straightforward, and faithful to the (non-app) original version, that's probably OK. The clinician can be relatively confident in its basic clinical utility. For example, if a smartphone app coaches users in diaphragmatic breathing to relieve stress or anxiety (e.g. Virtual Hope Box, Breathe2Relax, Tactical Breather), there is already abundant evidence in the research literature for the effectiveness of that technique in non-app contexts. Similarly, if a standardized mental health self-assessment tool, such as the PTSD Checklist (PCL-5) or the Patient Health Questionnaire (PHQ-9), is presented by an app in a close approximation to the psychometrically validated originals (e.g. similar wording, item order, instructions, response choices), it's probably safe to assume that the app version is comparable.

However, for apps with content, functioning, and processes that diverge from tried and tested originals, a clinician needs to be more wary. Sometimes just a small change in the way a tool is displayed or the environment in which it is delivered can have a significant effect on its performance. The small size of a smartphone screen may dictate modifications to the layout of a tool that make its properties unpredictable. The discreet portability of a smartphone may allow mental health self-management tools, for example, to be used in situations for which the original tools were never intended – environments with inadequate privacy, distractions, interruptions, and noise. These areas of divergence may render previous evidence inapplicable, in which case the app must be shown to be fully evidence-based in its own right per criteria 5 or 6.

One last word about privacy and security. The suggestions above focus on practical criteria for levels of scientific evidence. A second, equally important concern for a clinician is whether a smartphone app collects, stores, and transmits sensitive and/or personal information about the user or others. In a clinical context especially, it is critical that personally identifiable information and protected health information are adequately protected.

Dr. Bush is chief of the Psychological Health Research branch at the Psychological Health Center of Excellence. He has a doctorate in psychology and is an affiliate associate professor in psychiatry and behavioral sciences at the University of Washington.

PHCoE Links

Last Updated: September 14, 2023

DHA Address: 7700 Arlington Boulevard | Suite 5101 | Falls Church, VA | 22042-5101

Some documents are presented in Portable Document Format (PDF). A PDF reader is required for viewing. Download a PDF Reader or learn more about PDFs.

Photos

Videos

Infographics

Follow us on Social Media

Clear Your Cache

How Good is that App? Evaluation Criteria for Military Mental Health Providers

Fundamental criteria

Required but scientifically sub-optimal criterion

Ideal criteria