A heuristic evaluation and cognitive walkthrough of RunwayML
Note: these evaluations were carried out in the late spring of 2021, several months before writing this. Perhaps the platform has addressed these usability issues since then.
RunwayML was co-founded in 2019 by Cristóbal Valenzuela and a couple of friends for artists and creators to use machine learning and artificial intelligence technices in their work. It is currently available as a web app and a downloadable platform.
This evaluation was part of a class assignment on usability. I chose RunwayML precisely because of the interesting intersection of machine learning and user experience that is designed for. It was designed for artists first, not business use cases. According to Valenzuela :
“Runway’s main goal is to make the process of using a state of the art machine learning model as easy as possible. While learning about the data behind these models and the training process is important, this project is not about creating the right training environment to deploy models to production. It is not about training an algorithm and it’s not about hyper-parameters or hardcore data science. It is a project built around the simple idea of making models accessible to people, so they can start thinking of new ways to use those models. And from there on, have a better understanding about how machine learning works. A process of learning by doing.”
This post seeks to address the usability of RunwayML as a platform and give suggestion for possible improvements.
Assumptions about the users of RunwayML that I’m writing about:
- they are not data scientists or machine learning experts, and thus have no in-depth knowledge of machine learning systems or pipelines
- have a familiarity with common types of software and web apps
- tend to use RunwayML for creative uses
Additionally, I completed this usability study using the WepApp form of RunwayML in both the Google Chrome and Safari web browsers.
Heuristic Evaluation Summary
For this evaluation, I used Nielsen’s 10 Usability Heuristics for User Interface Design. What follows below is only a summary of a larger evaluation, and the following are the most important in terms of what’s done well and what’s done poorly on RunwayML’s site.
The Good: Effective Use of Language
Maps to Nielsen’s Usability Heuristic #2: Match between system and the real world
RunwayML is not geared towards machine learning specialists, data scientists, or AI developers. Terms in the model catalog aren’t using terms such as “natural language processing” or “computer vision”. They instead use terms that most people can understand like “text generation” or “image analysis”.
The Bad: What’s Going on With the Model?
Maps to Nielsen’s Usability Heuristics #1: Visibility of System Status and #5: Error Prevention.
Description: When trying to train a language model to give output, the images above were the four messages given to me in sequence, from 1 to 4.
Violation: While Buttons #1 and #2 are informative, after that the user really has no idea what is going on. Button #3 is vague and Button #4 gives no idea to the user what is going on.
Recommendation: A progress bar after Button #1 would be the best informative feedback to give the user on the progress of model training. The user has no idea if the mode isn’t working or if they’ve committed an error. Unless these are corrected, users will get frustrated and this will in turn make the system seem unreliable.
I looked at three tasks in my cognitive walkthrough of the RunwayML site:
Task #1: The user uploads a file or folder for use in one of RunwayML’s offered models or actions.
Task #2: The user colorizes a black and white image.
Task #3: The user uses a model to generate fake BTS lyrics.
All three of the tasks went relatively smoothly. Uploading files and folders on the site follows standard conventions for uploads on most web sites and web apps, with buttons and conventions that users are able to follow easily. Actions throughout are indicated clearly by the visual hierarchy, cursor changes, and language.
Getting output, however, seems less straightforward. With both Tasks #2 and #3, getting model output took extra effort on the part of the user. While friction can be useful for getting user attention so that they pay extra attention to details such as privacy and security settings, or having them examine additional fees during a transaction, there is no clear reason for that here.
Action: Get New Data
When colorizing a black and white image, the user only sees the message “In Progress…” as shown in the image on the left above. The user has to go to their email address they used to register for the site to find out if their image has finished processing. The user can follow the link in the email back to the RunwayML site, where their completed asset is waiting to use or be downloaded. I also discovered that if the user refreshes the website to find their completed asset. The site doesn’t update automatically, or give a message to the user that the model has finished processing the image.
Note: I was colorizing an image I found while researching for an art project I was doing related to the films of Wong Kar-Wai, thus the image from 2046 as shown above.
Action: Get Results
When using the BTS lyric generation model to generate lyrics in the style of BTS based on lyrics or words input to the model by the user, I discovered another issue with usability in relation to RunwayML’s model output. After the model is supposedly engaged, where the red button as in the image on the left above merely says “Stop”, the user has no idea as to what the status of the model is. The user must go separately to check their Assets page on RunwayML (which means disengaging from the model page, or opening a new window or tab in their browser) or to their email, as the user must do to get output from the image task above. For any batch results, the user has to download via zip file (unless the user is tech capable enough to use RunwayML’s API) and examine the results locally, instead of within the web app.
For the really basic function of uploading data, RunwayML is fine. It is nothing revolutionary, but it doesn’t need to be. If the web app couldn’t handle that, it wouldn’t even be a functional app.
Again, when it comes to showing the user where their data is during either a simple process such as changing the color in an image, or running a batch operation with a pre-existing machine learning model, the system shows weaknesses. There needs to be greater attention to communication with the user, giving useful feedback and more seamless interaction.
One thing that shows pretty clearly is that the developers are thinking of developing the model sections in the paradigm of the first part of a machine learning pipeline rather than how to teach a user to go through and learn the basics they need for utilizing a model well. The lack of clear feedback in the model running stage almost defeats the purpose of RunwayML as building a no code machine learning solution. Before giving a final evaluation of functionality with the platform, however, I would need to evaluate with the API and a URL endpoint.
Conclusions and Recommendations
The summary of RunwayML’s usability:
Usable: RunwayML is usable, if a bit annoying to use as a web app.
Falls Short: It has some usability issues that seem unacceptable for a commercial product.
Fun, not Serious: I recommend playing with the app, but wouldn’t use it for commercial purposes just yet.
A few recommendations for RunwayML’s UX team:
- Create a Usage Maturity Matrix to transition users from ML novices to “expert enough” for the no code ML platform. Even non-expert users, for instance, should understand the concept of benchmarking in some form, even for images or words to be used in artistic endeavors.
- Create a common design language to fix readability and consistency issues
- Design models beyond the machine learning pipeline paradigm for non-developer users of machine learning products.
No code ML platforms are pretty critical for the future of ML and AI beyond the experts. It’s crucial to develop the required skills for users of ML, such as understanding benchmarking and ethical considerations, so that the non-technical creators and users of ML and AI don’t make the same mistakes as many users of ML have already done. This is especially important as these technologies are adopted into greater commercial and artistic use, so that creative AI adoption doesn’t hit a brick wall.
 Cristobál Valenzuela. 2018. Machine Learning En Plein Air: Building accessible tools for artists