The Cloudflare Blog

Extending Cloudflare Radar’s security insights with new DDoS, leaked credentials, and bots datasets

David Belson — Tue, 18 Mar 2025 13:00:00 GMT

Security and attacks continues to be a very active environment, and the visibility that Cloudflare Radar provides on this dynamic landscape has evolved and expanded over time. To that end, during 2023’s Security Week, we launched our URL Scanner, which enables users to safely scan any URL to determine if it is safe to view or interact with. During 2024’s Security Week, we launched an Email Security page, which provides a unique perspective on the threats posed by malicious emails, spam volume, the adoption of email authentication methods like SPF, DMARC, and DKIM, and the use of IPv4/IPv6 and TLS by email servers. For Security Week 2025, we are adding several new DDoS-focused graphs, new insights into leaked credential trends, and a new Bots page to Cloudflare Radar. We are also taking this opportunity to refactor Radar’s Security & Attacks page, breaking it out into Application Layer and Network Layer sections.

Below, we review all of these changes and additions to Radar.

Layered security

Since Cloudflare Radar launched in 2020, it has included both network layer (Layers 3 & 4) and application layer (Layer 7) attack traffic insights on a single Security & Attacks page. Over the last four-plus years, we have evolved some of the existing data sets on the page, as well as adding new ones. As the page has grown and improved over time, it risked becoming unwieldy to navigate, making it hard to find the graphs and data of interest. To help address that, the Security section on Radar now features separate Application Layer and Network Layer pages. The Application Layer page is the default, and includes insights from analysis of HTTP-based malicious and attack traffic. The Network Layer page includes insights from analysis of network and transport layer attacks, as well as observed TCP resets and timeouts. Future security and attack-related data sets will be added to the relevant page. Email Security remains on its own dedicated page.

A geographic and network view of application layer DDoS attacks

Radar’s quarterly DDoS threat reports have historically provided insights, aggregated on a quarterly basis, into the top source and target locations of application layer DDoS attacks. A new map and table on Radar’s Application Layer Security page now provide more timely insights, with a global choropleth map showing a geographical distribution of source and target locations, and an accompanying list of the top 20 locations by share of all DDoS requests. Source location attribution continues to rely on the geolocation of the IP address originating the blocked request, while target location remains the billing location of the account that owns the site being attacked.

Over the first week of March 2025, the United States, Indonesia, and Germany were the top sources of application layer DDoS attacks, together accounting for over 30% of such attacks as shown below. The concentration across the top targeted locations was quite different, with customers from Canada, the United States, and Singapore attracting 56% of application layer DDoS attacks.

In addition to extended visibility into the geographic source of application layer DDoS attacks, we have also added autonomous system (AS)-level visibility. A new treemap view shows the distribution of these attacks by source AS. At a global level, the largest sources include cloud/hosting providers in Germany, the United States, China, and Vietnam.

For a selected country/region, the treemap displays a source AS distribution for attacks observed to be originating from that location. In some, the sources of attack traffic are heavily concentrated in consumer/business network providers, such as in Portugal, shown below. However, in other countries/regions that have a large cloud provider presence, such as Ireland, Singapore, and the United States, ASNs associated with these types of providers are the dominant sources. To that end, Singapore was listed as being among the top sources of application layer DDoS attacks in each of the quarterly DDoS threat reports in 2024.

Have you been pwned?

Every week, it seems like there’s another headline about a data breach, talking about thousands or millions of usernames and passwords being stolen. Or maybe you get an email from an identity monitoring service that your username and password were found on the “dark web”. (Of course, you’re getting those alerts thanks to a complementary subscription to the service offered as penance from another data breach…)

This credential theft is especially problematic because people often reuse passwords, despite best practices advising the use of strong, unique passwords for each site or application. To help mitigate this risk, starting in 2024, Cloudflare began enabling customers to scan authentication requests for their websites and applications using a privacy-preserving compromised credential checker implementation to detect known-leaked usernames and passwords. Today, we're using aggregated data to display trends in how often these leaked and stolen credentials are observed across Cloudflare's network. (Here, we are defining “leaked credentials” as usernames or passwords being found in a public dataset, or the username and password detected as being similar.)

Leaked credentials detection scans incoming HTTP requests for known authentication patterns from common web apps and any custom detection locations that were configured. The service uses a privacy-preserving compromised credential checking protocol to compare a hash of the detected passwords to hashes of compromised passwords found in databases of leaked credentials. A new Radar graph on the worldwide Application Layer Security page provides visibility into aggregate trends around the detection of leaked credentials in authentication requests. Filterable by authentication requests from human users, bots, or all (human + bot), the graph shows the distribution requests classified as “clean” (no leaked credentials detected) and “compromised” (leaked credentials, as defined above, were used). At a worldwide level, we found that for the first week of March 2025, leaked credentials were used in 64% of all, over 65% of bot, and over 44% of human authorization requests.

This suggests that from a human perspective, password reuse is still a problem, as is users not taking immediate actions to change passwords when notified of a breach. And from a bot perspective, this suggests that attackers know that there is a good chance that leaked credentials for one website or application will enable them to access that same user’s account elsewhere.

As a complement to the leaked credentials data, Radar is also now providing a worldwide view into the share of authentication requests originating from bots. Note that not all of these requests are necessarily malicious — while some may be associated with credential stuffing-style attacks, others may be from automated scripts or other benign applications accessing an authentication endpoint. (Having said that, automated malicious attack request volume far exceeds legitimate automated login attempts.) During the first week of March 2025, we found that over 94% of authentication requests came from bots (were automated), with the balance coming from humans. Over that same period, bot traffic only accounted for 30% of overall requests. So although bots don’t represent a majority of request traffic, authentication requests appear to comprise a significant portion of their activity.

Bots get a dedicated page

As a reminder, bot traffic describes any non-human Internet traffic, and monitoring bot levels can help spot potential malicious activities. Of course, bots can be helpful too, and Cloudflare maintains a list of verified bots to help keep the Internet healthy. Given the importance of monitoring bot activity, we have launched a new dedicated Bots page in the Traffic section of Cloudflare Radar to support these efforts. For both worldwide and location views over the selected time period, the page shows the distribution of bot (automated) vs. human HTTP requests, as well as a graph showing bot traffic trends. (Our bot score, combining machine learning, heuristics, and other techniques, is used to identify automated requests likely to be coming from bots.)

Both the 2023 and 2024 Cloudflare Radar Year in Review microsites included a “Bot Traffic Sources” section, showing the locations and networks that Cloudflare determined that the largest shares of automated/likely automated traffic was originating from. However, these traffic shares were published just once a year, aggregating traffic from January through the end of November.

In order to provide a more timely perspective, these insights are now available on the new Radar Bots page. Similar to the new DDoS attacks content discussed above, the worldwide view includes a choropleth map and table illustrating the locations originating the largest shares of all bot traffic. (Note that a similar Traffic Characteristics map and table on the Traffic Overview page ranks locations by the bot traffic share of the location’s total traffic.) Similar to Year in Review data linked above, the United States continues to originate the largest share of bot traffic.

In addition, the worldwide view also breaks out bot traffic share by AS, mirroring the treemap shown in the Year in Review. As we have noted previously, cloud platform providers account for a significant amount of bot traffic.

At a location level, depending on the country/region selected, the top sources of bot traffic may be cloud/hosting providers, consumer/business network providers, or a mix. For instance, France’s distribution is shown below, and four ASNs account for just over half of the country’s bot traffic. Of these ASNs, two (AS16276 and AS12876) belong to cloud/hosting providers, and two (AS3215 and AS12322) belong to network providers.

In addition, the Verified Bots list has been moved to the new Bots page on Radar. The data shown and functionality remains unchanged, and links to the old location will automatically be redirected to the new one.

Summary

The Cloudflare dashboard provides customers with specific views of security trends, application and network layer attacks, and bot activity across their sites and applications. While these views are useful at an individual customer level, aggregated views at a worldwide, location, and network level provide a macro-level perspective on trends and activity. These aggregated views available on Cloudflare Radar not only help customers understand how their observations compare to the larger whole, but they also help the industry understand emerging threats that may require action.

The underlying data for the graphs and data discussed above is available via the Radar API (Application Layer, Network Layer, Bots, Leaked Credentials). The data can also be interactively explored in more detail across locations, networks, and time periods using Radar’s Data Explorer and AI Assistant. And as always, Radar and Data Explorer charts and graphs are downloadable for sharing, and embeddable for use in your own blog posts, websites, or dashboards.

If you share our security, attacks, or bots graphs on social media, be sure to tag us: @CloudflareRadar and @1111Resolver (X), noc.social/@cloudflareradar (Mastodon), and radar.cloudflare.com (Bluesky). If you have questions or comments, you can reach out to us on social media, or contact us via email.

Helping keep customers safe with leaked password notification

Garrett Galow — Mon, 24 Jun 2024 17:06:42 GMT

Password reuse is a real problem. When people use the same password across multiple services, it creates a risk that a breach of one service will give attackers access to a different, apparently unrelated, service. Attackers know people reuse passwords and build giant lists of known passwords and known usernames or email addresses.

If you got to the end of that paragraph and realized you’ve reused the same password multiple places, stop reading and go change those passwords. We’ll wait.

To help protect Cloudflare customers who have used a password attackers know about, we are releasing a feature to improve the security of the Cloudflare dashboard for all our customers by automatically checking whether their Cloudflare user password has appeared in an attacker's list. Cloudflare will securely check a customer’s password against threat intelligence sources that monitor data breaches in other services.

If a customer logs in to Cloudflare with a password that was leaked in a breach elsewhere on the Internet, Cloudflare will alert them and ask them to choose a new password.

For some customers, the news that their password was known to hackers will come as a surprise – no one wants to intentionally use passwords that they know have been leaked elsewhere. To help customers avoid being locked out when they urgently need to use their Cloudflare dashboard, the leaked password check will provide a warning to the customer for the first three login attempts. After those three attempts, Cloudflare will require that the customer reset their password.

Resetting a leaked password is just the first step in Internet account security. The best way to protect your Cloudflare account, or any account, is to add two-factor authentication, such as using a hardware security key or an authenticator application, or to rely on a single sign-on integration. Cloudflare makes it easy for any user to add two-factor authentication security to their account through app-based codes, hardware keys, or passkeys. Cloudflare account Super Administrators can also require that all members enable two-factor authentication.

Whether or not a user has been impacted in a data breach, we encourage everyone to add two-factor authentication security to their Cloudflare account.

How do credentials leak?

Each time you authenticate to a service on the Internet with a username and password, that service can take a range of steps to protect your credentials.

More secure providers will hash the passwords. Hashing uses a cryptographic algorithm to convert the password into a random string of characters. Some platforms will layer on additional safeguards like a salt mechanism that introduces a random value to each password before the hashing process to ensure that two identical passwords do not have identical hashes.

These protections, combined with rate limits on login attempts, prevent brute force attacks. However, even for providers that adopt these best practices, users can still become victims of determined attackers when bad actors gain access to breached password databases. Attackers can collect compromised email password pairs to gain access to user accounts elsewhere as part of targeted attacks.

When vendors discover these kinds of compromised accounts, in many cases they will quickly force a password reset. However, resetting a leaked password in one application can still leave you vulnerable in other applications if you reused that password in other places and do not change your credentials everywhere.

That kind of password reuse means that an attacker can steal your credentials from one Internet service and try them against dozens of other popular destinations to see where you reuse the same password.

These so-called credential stuffing attacks have become more prevalent as breaches pile up. Attackers can sit on large troves of credentials for months or years, waiting to sell them to another bad actor or to use them in targeted attacks. For customers who want to protect themselves from these and other attacks that can compromise their end customer’s accounts, Cloudflare has solutions like bot management, exposed credential checks, and rate limiting available to help defend against these kinds of attacks

How can customers protect against the impact of leaked credentials?

If every password you use is unique, then a data breach in one vendor will be limited to just that particular system. For that reason, we encourage users to adopt a password manager that can create and remember unique passwords for each service that you use. Thankfully, the most popular operating systems now include password managers by default, and multiplatform third party options also make it easy for users to adopt this practice.

However, unique passwords are still vulnerable to phishing attacks. The best way to protect any Internet account is to add two-factor authentication (2FA). Two-factor authentication provides a comprehensive defense against credential stuffing attacks. When using two-factor authentication to log in, you must use your password and, for example, a one-time code from an app or a tap on a physical hardware key. The password alone is not enough to access your account.

Adoption of two-factor authentication, specifically hardware keys, has been shown to be able to eliminate 99.9% of account takeovers, since the attacker must also get access to your second factor in addition to your password. In the case of hardware keys, they need to physically obtain the key.

How does Cloudflare check for leaked credentials?

When a user attempts to log into Cloudflare, we will check if the password used has been leaked in a known data breach of another service or application on the Internet. We maintain data on breaches of Internet services that we can use to search against. Because we use password hashes, scrambled versions of the original password that can’t be easily reversed, we compare a hash of the password to hashes of compromised passwords found in these lists from other attacks. An additional benefit, beside the security of hashes, is the ability to perform fast lookups, much faster than plaintext searches. This means that we can perform these checks without adding significant latency to the login process for users.

Because of the potential impact of a Cloudflare account being compromised, we opt for a more secure approach of disallowing leaked credentials regardless of whether they were associated with the specific user’s email or not. Unfortunately, data breaches are likely to continue to happen. Therefore, being proactive helps reduce the risk of new breaches that contain the email and password pair allowing for an account to be compromised before the data is available to us.

If we detect a match, the user will be prompted with the following warning. We will also send an email notification with instructions on what to do and a unique link in order to reset the password.

At this point, the user will still be able to log in to their Cloudflare account. We strongly encourage users to reset their password immediately. However, we know that in some cases you need to reach the Cloudflare dashboard immediately or do not have convenient access to the email used for the account. Cloudflare will allow two additional login attempts to succeed with the same compromised password before forcing the user to reset their password.

To reset a password in Cloudflare Dashboard, navigate to the Authentication page in My Profile. From here, select Change Password and enter both the current password to authenticate and a new, non-compromised password. Alternatively, for those whose password was leaked, upon login an email will be sent with a unique link to reset the password.

What’s next?

Forcing users to reset compromised credentials helps prevent attacks from spreading on the Internet, but it’s just a small piece of improved account security. We know that adding the next step, second factor authentication, can be cumbersome. We have committed to CISA’s Secure by Design Pledge, which includes working to increase 2FA adoption across the industry. We will share our plans on how we will be implementing the pledge by mid-2025.

Adding multifactor authentication to every one of your accounts on the Internet can still be a chore, no matter how much the experience is improved. It’s much easier if you can just do it with one account and use that account to authenticate into other services and applications – a single-sign on (SSO) flow. Right now, our SSO feature is limited to enterprise accounts, and we plan to change that. We will allow users to access Cloudflare through other providers like Google, GitHub, and more. Allowing users to reduce how many unique password and 2FA combinations they have to keep track of helps to reduce the likelihood of being impacted by future password breaches.

Research Directions in Password Security

Ian McQuoid — Thu, 14 Oct 2021 12:59:14 GMT

As Internet users, we all deal with passwords every day. With so many different services, each with their own login systems, we have to somehow keep track of the credentials we use with each of these services. This situation leads some users to delegate credential storage to password managers like LastPass or a browser-based password manager, but this is far from universal. Instead, many people still rely on old-fashioned human memory, which has its limitations — leading to reused passwords and to security problems. This blog post discusses how Cloudflare Research is exploring how to minimize password exposure and thwart password attacks.

The Problem of Password Reuse

Because it’s too difficult to remember many distinct passwords, people often reuse them across different online services. When breached password datasets are leaked online, attackers can take advantage of these to conduct “credential stuffing attacks”. In a credential stuffing attack, an attacker tests breached credentials against multiple online login systems in an attempt to hijack user accounts. These attacks are highly effective because users tend to reuse the same credentials across different websites, and they have quickly become one of the most prevalent types of online guessing attacks. Automated attacks can be run at a large scale, testing out exposed passwords across multiple systems, under the assumption that some of these passwords will unlock accounts somewhere else (if they have been reused). When a data breach is detected, users of that service will likely receive a security notification and will reset that account password. However, if this password was reused elsewhere, they may easily forget that it needs to be changed for those accounts as well.

How can we protect against credential stuffing attacks? There are a number of methods that have been deployed — with varying degrees of success. Password managers address the problem of remembering a strong, unique password for every account, but many users have yet to adopt them. Multi-factor authentication is another potential solution — that is, using another form of authentication in addition to the username/password pair. This can work well, but has limits: for example, such solutions may rely on specialized hardware that not all clients have. Consumer systems are often reluctant to mandate multi-factor authentication, given concerns that people may find it too complicated to use; companies do not want to deploy something that risks impeding the growth of their user base.

Since there is no perfect solution, security researchers continue to try to find improvements. Two different approaches we will discuss in this blog post are hardening password systems using cryptographically secure keys, and detecting the reuse of compromised credentials, so they don’t leave an account open to guessing attacks.

Improved Authentication with PAKEs

Investigating how to securely authenticate a user just using what they can remember has been an important area in secure communication. To this end, the subarea of cryptography known as Password Authenticated Key Exchange (PAKE) came about. PAKEs deal with protocols for establishing cryptographically secure keys where the only source of authentication is a human memorizable (low-entropy, attacker-guessable) password — that is, the “what you know” side of authentication.

Before diving into the details, we’ll provide a high-level overview of the basic problem. Although passwords are typically protected in transit by being sent over HTTPS, servers handle them in plaintext to verify them once they arrive. Handling plaintext passwords increases security risk — for instance, they might get inadvertently logged and exposed. Ideally, the user’s password never gets sent to the server in the first place. This is where PAKEs come in — a means of verifying that the user and server share a password, ideally without revealing information about the password that could help attackers to discover or crack it.

A few words on PAKEs

PAKE protocols let two parties turn a password into a shared key. Each party only gets one guess at the password the other holds. If a user tries to log in to the wrong server with a PAKE, that server will not be able to turn around and impersonate the user. As such, PAKEs guarantee that communication with one of the parties is the only way for an attacker to test their (single) password guess. This may seem like an unneeded level of complexity when we could use already available tools like a key distribution mechanism along with password-over-TLS, but this puts a lot of trust in the service. You may trust a service with learning your password on that service, but what about if you accidentally use a password for a different service when trying to log in? Note the particular risks of a reused password: it is no longer just a secret shared between a user and a single service, but is now a secret shared between a user and multiple services. This therefore increases the password’s privacy sensitivity — a service should not know users’ account login information for other services.

A comparison of shared secrets between passwords over TLS versus PAKEs.With passwords over TLS, a service might learn passwords used on another service. This problem does not arise with PAKEs.

PAKE protocols are built with the assumption that the server isn’t always working in the best interest of the client and, even more, cannot use any kind of public-key infrastructure during login (although it doesn’t hurt to have both!). This precludes the user from sending their plaintext password (or any information that could be used to derive it — in a computational sense) to the server during login.

PAKE protocols have expanded into new territory since the seminal EKE paper of Bellovin and Merritt, where the client and server both remembered a plaintext version of the password. As mentioned above, when the server stores the plaintext password, the client risks having the password logged or leaked. To address this, new protocols were developed, referred to as augmented, verifier-based, or asymmetric PAKEs (aPAKEs), where the server stored a modified version (similar to a hash) of the password instead of the plaintext password. This mirrors the way many of us were taught to store passwords in a database, specifically as a hash of the password with accompanying salt and pepper. However, in these cases, attackers can still use traditional methods of attack such as targeted rainbow tables. To avoid these kinds of attacks, a new kind of PAKE was born, the strong asymmetric PAKE (saPAKE).

OPAQUE was the first saPAKE and it guarantees defense against precomputation by hiding the password dictionary itself! It does this by replacing the noninteractive hash function with an interactive protocol referred to as an Oblivious Pseudorandom Function (OPRF) where one party inputs their “salt”, another inputs their “password”, and only the password-providing party learns the output of the function. The fact that the password-providing party learns nothing (computationally) about the salt prevents offline precomputation by disallowing an attacker from evaluating the function in their head.

Another way to think about the three PAKE paradigms has to do with how each of them treats the password dictionary:

PAKE type

Password Dictionary

Threat Model

PAKE

The password dictionary is public and common to every user.

Without any guessing, the attacker learns the user’s password upon compromise of the server.

aPAKE

Each user gets their own password dictionary; a description of the dictionary (e.g., the “salt”) is leaked to the client when they attempt to log in.

The attacker must perform an independent precomputation for each client they want to attack.

saPAKE (e.g., OPAQUE)

Each user gets their own password dictionary; the server only provides an online interface (the OPRF) to the dictionary.

The adversary must wait until after they compromise the server to run an offline attack on the user’s password¹.

OPAQUE also goes one step further and allows the user to perform the password transformation on their own device so that the server doesn’t see the plaintext password during registration either. Cloudflare Research has been involved with OPAQUE for a while now — for instance, you can read about our previous implementation work and demo if you want to learn more.

But OPAQUE is not a panacea: in the event of server compromise, the attacker can learn the salt that the server uses to evaluate the OPRF and can still run the same offline attack that was available in the aPAKE world, although this is now considerably more time-consuming and can be made increasingly difficult through the use of memory-hard hash functions like scrypt. This means that despite our best efforts, when a server is breached, the attacker can eventually come out with a list of plaintext passwords. Indeed, this attack is always inevitable as the attacker can always run the (sa)PAKE protocol in their head acting as both parties to test each password. With this being the case, we still need to take steps to defend against automated password attacks such as credential stuffing attacks and have ways of mitigating them.

Are You Overexposed?

To help detect and respond to credential stuffing, Cloudflare recently rolled out the Exposed Credential Checks feature on the Web Application Firewall (WAF), which can alert the origin if a user’s login credentials have appeared in a recent breach. Historically, compromised credential checking services have allowed users to be proactive against credential stuffing attacks when their username and password appear together in a breach. However, they do not account for recently proposed credential tweaking attacks, in which an attacker tries variants of a breached password, under the assumption that users often use slight modifications of the same password for different accounts, such as “sunshineFB”, “sunshineIG”, and so on. Therefore, compromised credential check services should incorporate methods of checking for credential tweaks.

Under the hood, Cloudflare’s Exposed Credential Checks feature relies on an underlying protocol deemed Might I Get Pwned (MIGP). MIGP uses the bucketization method proposed in Li et al. to avoid sending the plaintext username or password to the server while handling a large breach dataset. After receiving a user’s credentials, MIGP hashes the username and sends a portion of that hash as a “bucket identifier” to the server. The client and server can then perform a private membership test protocol to verify whether the user’s username/password pair appeared in that bucket, without ever having to send plaintext credentials to the server.

Unlike previous compromised credential check services, MIGP also enables credential tweaking checks by augmenting the original breach dataset with a set of password “variants”. For each leaked password, it generates a list of password variants, which are labeled as such to differentiate them from the original leaked password and added to the original dataset. For more information, you can check out the Cloudflare Research blog post detailing our open-source implementation and deployment of the MIGP protocol.

Measuring Credential Compromises

The question remains, just how important are these exposed credential checks for detecting and preventing credential stuffing attacks in practice? To answer this question, the Research Team has initiated a study investigating login requests to our own Cloudflare dashboard. For this study, we are collecting the data logged by Cloudflare’s Exposed Credential Check feature (described above), designed to be privacy-preserving: this check does not reveal a password, but provides a “yes/no” response on whether the submitted credentials appear in our breach dataset. Along with this signal, we are looking at other fields that may be indicative of malicious behavior such as bot score and IP reputation. As this project develops, we plan to cluster the data to find patterns of different types of credential stuffing attacks that we can generalize to form attack fingerprints. We can then feed these fingerprints into the alert logs for the Cloudflare Detection & Response team to see if they provide useful information for the security analysts.

Additionally, we hope to investigate potential post-compromise behavior as it relates to these compromise check fields. After an attacker successfully hijacks an account, they may take a number of actions such as changing the password, revoking all valid access tokens, or setting up a malicious script. By analyzing compromised credential checks along with these signals, we may be able to better differentiate benign from malicious behavior.

Future directions: OPAQUE and MIGP combined

This post has discussed how we’re approaching the problem of preventing credential stuffing attacks from two different angles. Through the deployment and analysis of compromised credential checks, we aim to prevent server compromise by detecting and preventing credential stuffing attacks before they happen. In addition, in the case that a server does get compromised, the wider use of OPAQUE would help address the problem of leaking passwords to an attacker by avoiding the reception and storage of plaintext passwords on the server as well as preventing precomputation attacks.

However, there are still remaining research challenges to address. Notably, the current method for interfacing with MIGP still requires the server to either pass along a plaintext version of the client’s password, or trust the client to honestly communicate with the MIGP service on behalf of the server. If we want to leverage the security guarantees of OPAQUE (or generally an saPAKE) with the analytics and alert system provided by MIGP in a privacy-preserving way, we need additional mechanisms.

At first glance, the privacy-preserving goals of both protocols seem to be perfect matches for each other. Both OPAQUE and MIGP are built upon the idea of replacing the traditional salted password hashes with an OPRF as a way of keeping the client’s plaintext passwords from ever leaving their device. However, both the interfaces for these protocols rely on user-provided inputs which aren’t cryptographically tied to each other. This allows an attacker to provide a false password to MIGP while providing their actual password to the OPAQUE server. Further, the security analysis of both protocols assume that their idealized building blocks are separated in an important way. This isn’t to say that the two protocols are incompatible, and indeed, much of these protocols may be salvaged.

The next stages for password privacy will be an integration of these two protocols such that a server can be made aware of credential stuffing attacks and the patterns of compromised account usage that can protect a server against the compromise of other servers while providing the same privacy guarantees OPAQUE does. Our goal is to allow you to protect yourself from other compromised servers while protecting your clients from compromise of your server. Stay tuned for updates!

We’re always keen to collaborate with others to build more secure systems, and would love to hear from those interested in password research. You can reach us with questions, comments, and research ideas at ask-research@cloudflare.com. For those interested in joining our team, please visit our Careers Page.

...

¹There are other ways of constructing saPAKE protocols. The curious reader can see this CRYPTO 2019 paper for details.

Helping build the next generation of privacy-preserving protocols

Nick Sullivan — Tue, 08 Dec 2020 12:00:00 GMT

Over the last ten years, Cloudflare has become an important part of Internet infrastructure, powering websites, APIs, and web services to help make them more secure and efficient. The Internet is growing in terms of its capacity and the number of people using it and evolving in terms of its design and functionality. As a player in the Internet ecosystem, Cloudflare has a responsibility to help the Internet grow in a way that respects and provides value for its users. Today, we’re making several announcements around improving Internet protocols with respect to something important to our customers and Internet users worldwide: privacy.

These initiatives are:

Fixing one of the last information leaks in HTTPS through Encrypted Client Hello (ECH), previously known as Encrypted SNI
Making DNS even more private by supporting Oblivious DNS-over-HTTPS (ODoH)
Developing a superior protocol for password authentication, OPAQUE, that makes password breaches less likely to occur

Each of these projects impacts an aspect of the Internet that influences our online lives and digital footprints. Whether we know it or not, there is a lot of private information about us and our lives floating around online. This is something we can help fix.

For over a year, we have been working through standards bodies like the IETF and partnering with the biggest names in Internet technology (including Mozilla, Google, Equinix, and more) to design, deploy, and test these new privacy-preserving protocols at Internet scale. Each of these three protocols touches on a critical aspect of our online lives, and we expect them to help make real improvements to privacy online as they gain adoption.

A continuing tradition at Cloudflare

One of Cloudflare’s core missions is to support and develop technology that helps build a better Internet. As an industry, we’ve made exceptional progress in making the Internet more secure and robust. Cloudflare is proud to have played a part in this progress through multiple initiatives over the years.

Here are a few highlights:

Universal SSL™. We’ve been one of the driving forces for encrypting the web. We launched Universal SSL in 2014 to give website encryption to our customers for free and have actively been working along with certificate authorities like Let’s Encrypt, web browsers, and website operators to help remove mixed content. Before Universal SSL launched to give all Cloudflare customers HTTPS for free, only 30% of connections to websites were encrypted. Through the industry’s efforts, that number is now 80% -- and a much more significant proportion of overall Internet traffic. Along with doing our part to encrypt the web, we have supported the Certificate Transparency project via Nimbus and Merkle Town, which has improved accountability for the certificate ecosystem HTTPS relies on for trust.
TLS 1.3 and QUIC. We’ve also been a proponent of upgrading existing security protocols. Take Transport Layer Security (TLS), the underlying protocol that secures HTTPS. Cloudflare engineers helped contribute to the design of TLS 1.3, the latest version of the standard, and in 2016 we launched support for an early version of the protocol. This early deployment helped lead to improvements to the final version of the protocol. TLS 1.3 is now the most widely used encryption protocol on the web and a vital component of the emerging QUIC standard, of which we were also early adopters.
Securing Routing, Naming, and Time. We’ve made major efforts to help secure other critical components of the Internet. Our efforts to help secure Internet routing through our RPKI toolkit, measurement studies, and “Is BGP Safe Yet” tool have significantly improved the Internet’s resilience against disruptive route leaks. Our time service (time.cloudflare.com) has helped keep people’s clocks in sync with more secure protocols like NTS and Roughtime. We’ve also made DNS more secure by supporting DNS-over-HTTPS and DNS-over-TLS in 1.1.1.1 at launch, along with one-click DNSSEC in our authoritative DNS service and registrar.

Continuing to improve the security of the systems of trust online is critical to the Internet’s growth. However, there is a more fundamental principle at play: respect. The infrastructure underlying the Internet should be designed to respect its users.

Building an Internet that respects users

When you sign in to a specific website or service with a privacy policy, you know what that site is expected to do with your data. It’s explicit. There is no such visibility to the users when it comes to the operators of the Internet itself. You may have an agreement with your Internet Service Provider (ISP) and the site you’re visiting, but it’s doubtful that you even know which networks your data is traversing. Most people don’t have a concept of the Internet beyond what they see on their screen, so it’s hard to imagine that people would accept or even understand what a privacy policy from a transit wholesaler or an inspection middlebox would even mean.

Without encryption, Internet browsing information is implicitly shared with countless third parties online as information passes between networks. Without secure routing, users’ traffic can be hijacked and disrupted. Without privacy-preserving protocols, users’ online life is not as private as they would think or expect. The infrastructure of the Internet wasn’t built in a way that reflects their expectations.

Normal network flow

Network flow with malicious route leak

The good news is that the Internet is continuously evolving. One of the groups that help guide that evolution is the Internet Architecture Board (IAB). The IAB provides architectural oversight to the Internet Engineering Task Force (IETF), the Internet’s main standard-setting body. The IAB recently published RFC 8890, which states that individual end-users should be prioritized when designing Internet protocols. It says that if there’s a conflict between the interests of end-users and the interest of service providers, corporations, or governments, IETF decisions should favor end users. One of the prime interests of end-users is the right to privacy, and the IAB published RFC 6973 to indicate how Internet protocols should take privacy into account.

Today’s technical blog posts are about improvements to the Internet designed to respect user privacy. Privacy is a complex topic that spans multiple disciplines, so it’s essential to clarify what we mean by “improving privacy.” We are specifically talking about changing the protocols that handle privacy-sensitive information exposed “on-the-wire” and modifying them so that this data is exposed to fewer parties. This data continues to exist. It’s just no longer available or visible to third parties without building a mechanism to collect it at a higher layer of the Internet stack, the application layer. These changes go beyond website encryption; they go deep into the design of the systems that are foundational to making the Internet what it is.

The toolbox: cryptography and secure proxies

Two tools for making sure data can be used without being seen are cryptography and secure proxies.

Cryptography allows information to be transformed into a format that a very limited number of people (those with the key) can understand. Some describe cryptography as a tool that transforms data security problems into key management problems. This is a humorous but fair description. Cryptography makes it easier to reason about privacy because only key holders can view data.

Another tool for protecting access to data is isolation/segmentation. By physically limiting which parties have access to information, you effectively build privacy walls. A popular architecture is to rely on policy-aware proxies to pass data from one place to another. Such proxies can be configured to strip sensitive data or block data transfers between parties according to what the privacy policy says.

Both these tools are useful individually, but they can be even more effective if combined. Onion routing (the cryptographic technique underlying Tor) is one example of how proxies and encryption can be used in tandem to enforce strong privacy. Broadly, if party A wants to send data to party B, they can encrypt the data with party B’s key and encrypt the metadata with a proxy’s key and send it to the proxy.

Platforms and services built on top of the Internet can build in consent systems, like privacy policies presented through user interfaces. The infrastructure of the Internet relies on layers of underlying protocols. Because these layers of the Internet are so far below where the user interacts with them, it’s almost impossible to build a concept of user consent. In order to respect users and protect them from privacy issues, the protocols that glue the Internet together should be designed with privacy enabled by default.

Data vs. metadata

The transition from a mostly unencrypted web to an encrypted web has done a lot for end-user privacy. For example, the “coffeeshop stalker” is no longer an issue for most sites. When accessing the majority of sites online, users are no longer broadcasting every aspect of their web browsing experience (search queries, browser versions, authentication cookies, etc.) over the Internet for any participant on the path to see. Suppose a site is configured correctly to use HTTPS. In that case, users can be confident their data is secure from onlookers and reaches only the intended party because their connections are both encrypted and authenticated.

However, HTTPS only protects the content of web requests. Even if you only browse sites over HTTPS, that doesn’t mean that your browsing patterns are private. This is because HTTPS fails to encrypt a critical aspect of the exchange: the metadata. When you make a phone call, the metadata is the phone number, not the call’s contents. Metadata is the data about the data.

To illustrate the difference and why it matters, here’s a diagram of what happens when you visit a website like an imageboard. Say you’re going to a specific page on that board (https://.com/room101/) that has specific embedded images hosted on .com.

Page load for an imageboard, returning an HTML page with an image from an embarassing site

Subresource fetch for the image from an embarassing site

The space inside the dotted line here represents the part of the Internet that your data needs to transit. They include your local area network or coffee shop, your ISP, an Internet transit provider, and it could be the network portion of the cloud provider that hosts the server. Users often don’t have a relationship with these entities or a contract to prevent these parties from doing anything with the user’s data. And even if those entities don’t look at the data, a well-placed observer intercepting Internet traffic could see anything sent unencrypted. It would be best if they just didn’t see it at all. In this example, the fact that the user visited .com can be seen by an observer, which is expected. However, though page content is encrypted, it’s possible to learn which specific page you’ve visited can be seen since .com is also visible.

It’s a general rule that if data is available to on-path parties on the Internet, some of these on-path parties will use this data. It’s also true that these on-path parties need some metadata in order to facilitate the transport of this data. This balance is explored in RFC 8558, which explains how protocols should be designed thoughtfully with respect to the balance between too much metadata (bad for privacy) and too little metadata (bad for operations).

In an ideal world, Internet protocols would be designed with the principle of least privilege. They would provide the minimum amount of information needed for the on-path parties (the pipes) to do the job of transporting the data to the right place and keep everything else confidential by default. Current protocols, including TLS 1.3 and QUIC, are important steps towards this ideal but fall short with respect to metadata privacy.

Knowing both who you are and what you do online can lead to profiling

Today’s announcements reflect two metadata protection levels: the first involves limiting the amount of metadata available to third-party observers (like ISPs). The second involves restricting the amount of metadata that users share with service providers themselves.

Hostnames are an example of metadata that needs to be protected from third-party observers, which DoH and ECH intend to do. However, it doesn’t make sense to hide the hostname from the site you’re visiting. It also doesn’t make sense to hide it from a directory service like DNS. A DNS server needs to know which hostname you’re resolving to resolve it for you!

A privacy issue arises when a service provider knows about both what sites you’re visiting and who you are. Individual websites do not have this dangerous combination of information (except in the case of third party cookies, which are going away soon in browsers), but DNS providers do. Thankfully, it’s not actually necessary for a DNS resolver to know *both* the hostname of the service you’re going to and which IP you’re coming from. Disentangling the two, which is the goal of ODoH, is good for privacy.

The Internet is part of 'our' Infrastructure

Roads should be well-paved, well lit, have accurate signage, and be optimally connected. They aren't designed to stop a car based on who's inside it. Nor should they be! Like transportation infrastructure, Internet infrastructure is responsible for getting data where it needs to go, not looking inside packets, and making judgments. But the Internet is made of computers and software, and software tends to be written to make decisions based on the data it has available to it.

Privacy-preserving protocols attempt to eliminate the temptation for infrastructure providers and others to peek inside and make decisions based on personal data. A non-privacy preserving protocol like HTTP keeps data and metadata, like passwords, IP addresses, and hostnames, as explicit parts of the data sent over the wire. The fact that they are explicit means that they are available to any observer to collect and act on. A protocol like HTTPS improves upon this by making some of the data (such as passwords and site content) invisible on the wire using encryption.

The three protocols we are exploring today extend this concept.

ECH takes most of the unencrypted metadata in TLS (including the hostname) and encrypts it with a key that was fetched ahead of time.
ODoH (a new variant of DoH co-designed by Apple, Cloudflare, and Fastly engineers) uses proxies and onion-like encryption to make the source of a DNS query invisible to the DNS resolver. This protects the user’s IP address when resolving hostnames.
OPAQUE uses a new cryptographic technique to keep passwords hidden even from the server. Utilizing a construction called an Oblivious Pseudo-Random Function (as seen in Privacy Pass), the server does not learn the password; it only learns whether or not the user knows the password.

By making sure Internet infrastructure acts more like physical infrastructure, user privacy is more easily protected. The Internet is more private if private data can only be collected where the user has a chance to consent to its collection.

Doing it together

As much as we’re excited about working on new ways to make the Internet more private, innovation at a global scale doesn’t happen in a vacuum. Each of these projects is the output of a collaborative group of individuals working out in the open in organizations like the IETF and the IRTF. Protocols must come about through a consensus process that involves all the parties that make up the interconnected set of systems that power the Internet. From browser builders to cryptographers, from DNS operators to website administrators, this is truly a global team effort.

We also recognize that sweeping technical changes to the Internet will inevitably also impact the technical community. Adopting these new protocols may have legal and policy implications. We are actively working with governments and civil society groups to help educate them about the impact of these potential changes.

We’re looking forward to sharing our work today and hope that more interested parties join in developing these protocols. The projects we are announcing today were designed by experts from academia, industry, and hobbyists together and were built by engineers from Cloudflare Research (including the work of interns, which we will highlight) with everyone’s support Cloudflare.

If you’re interested in this type of work, we’re hiring!

OPAQUE: The Best Passwords Never Leave your Device

Tatiana Bradley — Tue, 08 Dec 2020 12:00:00 GMT

Update: On January 19, 2022, we added a new demo for the OPAQUE protocol.

Passwords are a problem. They are a problem for reasons that are familiar to most readers. For us at Cloudflare, the problem lies much deeper and broader. Most readers will immediately acknowledge that passwords are hard to remember and manage, especially as password requirements grow increasingly complex. Luckily there are great software packages and browser add-ons to help manage passwords. Unfortunately, the greater underlying problem is beyond the reaches of software to solve.

The fundamental password problem is simple to explain, but hard to solve: A password that leaves your possession is guaranteed to sacrifice security, no matter its complexity or how hard it may be to guess. Passwords are insecure by their very existence.

You might say, “but passwords are always stored in encrypted format!” That would be great. More accurately, they are likely stored as a salted hash, as explained below. Even worse is that there is no way to verify the way that passwords are stored, and so we can assume that on some servers passwords are stored in cleartext. The truth is that even responsibly stored passwords can be leaked and broken, albeit (and thankfully) with enormous effort. An increasingly pressing problem stems from the nature of passwords themselves: any direct use of a password, today, means that the password must be handled in the clear.

You say, “but my password is transmitted securely over HTTPS!” This is true.

You say, “but I know the server stores my password in hashed form, secure so no one can access it!” Well, this puts a lot of faith in the server. Even so, let’s just say that yes, this may be true, too.

There remains, however, an important caveat — a gap in the end-to-end use of passwords. Consider that once a server receives a password, between being securely transmitted and securely stored, the password has to be read and processed. Yes, as cleartext!

And it gets worse — because so many are used to thinking in software, it’s easy to forget about the vulnerability of hardware. This means that even if the software is somehow trusted, the password must at some point reside in memory. The password must at some point be transmitted over a shared bus to the CPU. These provide vectors of attack to on-lookers in many forms. Of course, these attack vectors are far less likely than those presented by transmission and permanent storage, but they are no less severe (recent CPU vulnerabilities such as Spectre and Meltdown should serve as a stark reminder.)

The only way to fix this problem is to get rid of passwords altogether. There is hope! Research and private sector communities are working hard to do just that. New standards are emerging and growing mature. Unfortunately, passwords are so ubiquitous that it will take a long time to agree on and supplant passwords with new standards and technology.

At Cloudflare, we’ve been asking if there is something that can be done now, imminently. Today’s deep-dive into OPAQUE is one possible answer. OPAQUE is one among many examples of systems that enable a password to be useful without it ever leaving your possession. No one likes passwords, but as long they’re in use, at least we can ensure they are never given away.

I’ll be the first to admit that password-based authentication is annoying. Passwords are hard to remember, tedious to type, and notoriously insecure. Initiatives to reduce or replace passwords are promising. For example, WebAuthn is a standard for web authentication based primarily on public key cryptography using hardware (or software) tokens. Even so, passwords are frustratingly persistent as an authentication mechanism. Whether their persistence is due to their ease of implementation, familiarity to users, or simple ubiquity on the web and elsewhere, we’d like to make password-based authentication as secure as possible while they persist.

My internship at Cloudflare focused on OPAQUE, a cryptographic protocol that solves one of the most glaring security issues with password-based authentication on the web: though passwords are typically protected in transit by HTTPS, servers handle them in plaintext to check their correctness. Handling plaintext passwords is dangerous, as accidentally logging or caching them could lead to a catastrophic breach. The goal of the project, rather than to advocate for adoption of any particular protocol, is to show that OPAQUE is a viable option among many for authentication. Because the web case is most familiar to me, and likely many readers, I will use the web as my main example.

Web Authentication 101: Password-over-TLS

When you type in a password on the web, what happens? The website must check that the password you typed is the same as the one you originally registered with the site. But how does this check work?

Usually, your username and password are sent to a server. The server then checks if the registered password associated with your username matches the password you provided. Of course, to prevent an attacker eavesdropping on your Internet traffic from stealing your password, your connection to the server should be encrypted via HTTPS (HTTP-over-TLS).

Despite use of HTTPS, there still remains a glaring problem in this flow: the server must store a representation of your password somewhere. Servers are hard to secure, and breaches are all too common. Leaking this representation can cause catastrophic security problems. (For records of the latest breaches, check out https://haveibeenpwned.com/).

To make these leaks less devastating, servers often apply a hash function to user passwords. A hash function maps each password to a unique, random-looking value. It’s easy to apply the hash to a password, but almost impossible to reverse the function and retrieve the password. (That said, anyone can guess a password, apply the hash function, and check if the result is the same.)

With password hashing, plaintext passwords are no longer stored on servers. An attacker who steals a password database no longer has direct access to passwords. Instead, the attacker must apply the hash to many possible passwords and compare the results with the leaked hashes.

Unfortunately, if a server hashes only the passwords, attackers can download precomputed rainbow tables containing hashes of trillions of possible passwords and almost instantly retrieve the plaintext passwords. (See https://project-rainbowcrack.com/table.htm for a list of some rainbow tables).

With this in mind, a good defense-in-depth strategy is to use salted hashing, where the server hashes your password appended to a random, per-user value called a salt. The server also saves the salt alongside the username, so the user never sees or needs to submit it. When the user submits a password, the server re-computes this hash function using the salt. An attacker who steals password data, i.e., the password representations and salt values, must then guess common passwords one by one and apply the (salted) hash function to each guessed password. Existing rainbow tables won’t help because they don’t take the salts into account, so the attacker needs to make a new rainbow table for each user!

This (hopefully) slows down the attack enough for the service to inform users of a breach, so they can change their passwords. In addition, the salted hashes should be hardened by applying a hash many times to further slow attacks. (See https://blog.cloudflare.com/keeping-passwords-safe-by-staying-up-to-date/ for a more detailed discussion).

These two mitigation strategies — encrypting the password in transit and storing salted, hardened hashes — are the current best practices.

A large security hole remains open. Password-over-TLS (as we will call it) requires users to send plaintext passwords to servers during login, because servers must see these passwords to match against registered passwords on file. Even a well-meaning server could accidentally cache or log your password attempt(s), or become corrupted in the course of checking passwords. (For example, Facebook detected in 2019 that it had accidentally been storing hundreds of millions of plaintext user passwords). Ideally, servers should never see a plaintext password at all.

But that’s quite a conundrum: how can you check a password if you never see the password? Enter OPAQUE: a Password-Authenticated Key Exchange (PAKE) protocol that simultaneously proves knowledge of a password and derives a secret key. Before describing OPAQUE in detail, we’ll first summarize PAKE functionalities in general.

Password Proofs with Password-Authenticated Key Exchange

Password-Authenticated Key Exchange (PAKE) was proposed by Bellovin and Merrit[1] in 1992, with an initial motivation of allowing password-authentication without the possibility of dictionary attacks based on data transmitted over an insecure channel.

Essentially, a plain, or symmetric, PAKE is a cryptographic protocol that allows two parties who share only a password to establish a strong shared secret key. The goals of PAKE are:

The secret keys will match if the passwords match, and appear random otherwise.
Participants do not need to trust third parties (in particular, no Public Key Infrastructure),
The resulting secret key is not learned by anyone not participating in the protocol - including those who know the password.
The protocol does not reveal either parties’ password to each other (unless the passwords match), or to eavesdroppers.

In sum, the only way to successfully attack the protocol is to guess the password correctly while participating in the protocol. (Luckily, such attacks can be mostly thwarted by rate-limiting, i.e, blocking a user from logging in after a certain number of incorrect password attempts).

Given these requirements, password-over-TLS is clearly not a PAKE, because:

It relies on WebPKI, which places trust in third-parties called Certificate Authorities (see https://blog.cloudflare.com/introducing-certificate-transparency-and-nimbus/ for an in-depth explanation of WebPKI and some of its shortcomings).
The user’s password is revealed to the server.
Password-over-TLS provides the user no assurance that the server knows their password or a derivative of it — a server could accept any input from the user with no checks whatsoever.

That said, plain PAKE is still worse than Password-over-TLS, simply because it requires the server to store plaintext passwords. We need a PAKE that lets the server store salted hashes if we want to beat the current practice.

An improvement over plain PAKE is what’s called an asymmetric PAKE (aPAKE), because only the client knows the password, and the server knows a hashed password. An aPAKE has the four properties of PAKE, plus one more:

An attacker who steals password data stored on the server must perform a dictionary attack to retrieve the password.

The issue with most existing aPAKE protocols, however, is that they do not allow for a salted hash (or if they do, they require that salt to be transmitted to the user, which means the attacker has access to the salt beforehand and can begin computing a rainbow table for the user before stealing any data). We’d like, therefore, to upgrade the security property as follows:

5*) An attacker who steals password data stored on the server must perform a per-user dictionary attack to retrieve the password after the data is compromised.

OPAQUE is the first aPAKE protocol with a formal security proof that has this property: it allows for a completely secret salt.

OPAQUE - Servers safeguard secrets without knowing them!

OPAQUE is what’s referred to as a strong aPAKE, which simply means that it resists these pre-computation attacks by using a secretly salted hash on the server. OPAQUE was proposed and formally analyzed by Stanislaw Jarecki, Hugo Krawcyzk and Jiayu Xu in 2018 (full disclosure: Stanislaw Jarecki is my academic advisor). The name OPAQUE is a combination of the names of two cryptographic protocols: OPRF and PAKE. We already know PAKE, but what is an OPRF? OPRF stands for Oblivious Pseudo-Random Function, which is a protocol by which two parties compute a function F(key, x) that is deterministic but outputs random-looking values. One party inputs the value x, and another party inputs the key - the party who inputs x learns the result F(key, x) but not the key, and the party providing the key learns nothing. (You can dive into the math of OPRFs here: https://blog.cloudflare.com/privacy-pass-the-math/).

The core of OPAQUE is a method to store user secrets for safekeeping on a server, without giving the server access to those secrets. Instead of storing a traditional salted password hash, the server stores a secret envelope for you that is “locked” by two pieces of information: your password known only by you, and a random secret key (like a salt) known only by the server. To log in, the client initiates a cryptographic exchange that reveals the envelope key to the client, but, importantly, not to the server.

The server then sends the envelope to the user, who now can retrieve the encrypted keys. (The keys included in the envelope are a private-public key pair for the user, and a public key for the server.) These keys, once unlocked, will be the inputs to an Authenticated Key Exchange (AKE) protocol, which allows the user and server to establish a secret key which can be used to encrypt their future communication.

OPAQUE consists of two phases, being credential registration and login via key exchange.

OPAQUE: Registration Phase

Before registration, the user first signs up for a service and picks a username and password. Registration begins with the OPRF flow we just described: Alice (the user) and Bob (the server) do an OPRF exchange. The result is that Alice has a random key rwd, derived from the OPRF output F(key, pwd), where key is a server-owned OPRF key specific to Alice and pwd is Alice’s password.

Within his OPRF message, Bob sends the public key for his OPAQUE identity. Alice then generates a new private/public key pair, which will be her persistent OPAQUE identity for Bob’s service, and encrypts her private key along with Bob’s public key with the rwd (we will call the result an encrypted envelope). She sends this encrypted envelope along with her public key (unencrypted) to Bob, who stores the data she provided, along with Alice’s specific OPRF keysecret, in a database indexed by her username.

OPAQUE: Login Phase

The login phase is very similar. It starts the same way as registration — with an OPRF flow. However, on the server side, instead of generating a new OPRF key, Bob instead looks up the one he created during Alice’s registration. He does this by looking up Alice’s username (which she provides in the first message), and retrieving his record of her. This record contains her public key, her encrypted envelope, and Bob’s OPRF key for Alice.

He also sends over the encrypted envelope which Alice can decrypt with the output of the OPRF flow. (If decryption fails, she aborts the protocol — this likely indicates that she typed her password incorrectly, or Bob isn’t who he says he is). If decryption succeeds, she now has her own secret key and Bob’s public key. She inputs these into an AKE protocol with Bob, who, in turn, inputs his private key and her public key, which gives them both a fresh shared secret key.

Integrating OPAQUE with an AKE

An important question to ask here is: what AKE is suitable for OPAQUE? The emerging CFRG specification outlines several options, including 3DH and SIGMA-I. However, on the web, we already have an AKE at our disposal: TLS!

Recall that TLS is an AKE because it provides unilateral (and mutual) authentication with shared secret derivation. The core of TLS is a Diffie-Hellman key exchange, which by itself is unauthenticated, meaning that the parties running it have no way to verify who they are running it with. (This is a problem because when you log into your bank, or any other website that stores your private data, you want to be sure that they are who they say they are). Authentication primarily uses certificates, which are issued by trusted entities through a system called Public Key Infrastructure (PKI). Each certificate is associated with a secret key. To prove its identity, the server presents its certificate to the client, and signs the TLS handshake with its secret key.

Modifying this ubiquitous certificate-based authentication on the web is perhaps not the best place to start. Instead, an improvement would be to authenticate the TLS shared secret, using OPAQUE, after the TLS handshake completes. In other words, once a server is authenticated with its typical WebPKI certificate, clients could subsequently authenticate to the server. This authentication could take place “post handshake” in the TLS connection using OPAQUE.

Exported Authenticators are one mechanism for “post-handshake” authentication in TLS. They allow a server or client to provide proof of an identity without setting up a new TLS connection. Recall that in the standard web case, the server establishes their identity with a certificate (proving, for example, that they are “cloudflare.com”). But if the same server also holds alternate identities, they must run TLS again to prove who they are.

The basic Exported Authenticator flow works resembles a classical challenge-response protocol, and works as follows. (We’ll consider the server authentication case only, as the client case is symmetric).

At any point after a TLS connection is established, Alice (the client) sends an authenticator request to indicate that she would like Bob (the server) to prove an additional identity. This request includes a context (an unpredictable string — think of this as a challenge), and extensions which include information about what identity the client wants to be provided. For example, the client could include the SNI extension to ask the server for a certificate associated with a certain domain name other than the one initially used in the TLS connection.

On receipt of the client message, if the server has a valid certificate corresponding to the request, it sends back an exported authenticator which proves that it has the secret key for the certificate. (This message has the same format as an Auth message from the client in TLS 1.3 handshake - it contains a Certificate, a CertificateVerify and a Finished message). If the server cannot or does not wish to authenticate with the requested certificate, it replies with an empty authenticator which contains only a well formed Finished message.

The client then checks that the Exported Authenticator it receives is well-formed, and then verifies that the certificate presented is valid, and if so, accepts the new identity.

In sum, Exported Authenticators provide authentication in a higher layer (such as the application layer) safely by leveraging the well-vetted cryptography and message formats of TLS. Furthermore, it is tied to the TLS session so that authentication messages can't be copied and pasted from one TLS connection into another. In other words, Exported Authenticators provide exactly the right hooks needed to add OPAQUE-based authentication into TLS.

OPAQUE with Exported Authenticators (OPAQUE-EA)

OPAQUE-EA allows OPAQUE to run at any point after a TLS connection has already been set up. Recall that Bob (the server) will store his OPAQUE identity, in this case a signing key and verification key, and Alice will store her identity — encrypted — on Bob’s server. (The registration flow where Alice stores her encrypted keys is the same as in regular OPAQUE, except she stores a signing key, so we will skip straight to the login flow). Alice and Bob run two request-authenticate EA flows, one for each party, and OPAQUE protocol messages ride along in the extensions section of the EAs. Let’s look in detail how this works.

First, Alice generates her OPRF message based on her password. She creates an Authenticator Request asking for Bob’s OPAQUE identity, and includes (in the extensions field) her username and her OPRF message, and sends this to Bob over their established TLS connection.

Bob receives the message and looks up Alice’s username in his database. He retrieves her OPAQUE record containing her verification key and encrypted envelope, and his OPRF key. He uses the OPRF key on the OPRF message, and creates an Exported Authenticator proving ownership of his OPAQUE signing key, with an extension containing his OPRF message and the encrypted envelope. Additionally, he sends a new Authenticator Request asking Alice to prove ownership of her OPAQUE signing key.

Alice parses the message and completes the OPRF evaluation using Bob’s message to get output rwd, and uses rwd to decrypt the envelope. This reveals her signing key and Bob’s public key. She uses Bob’s public key to validate his Authenticator Response proof, and, if it checks out, she creates and sends an Exported Authenticator proving that she holds the newly decrypted signing key. Bob checks the validity of her Exported Authenticator, and if it checks out, he accepts her login.

My project: OPAQUE-EA over HTTPS

Everything described above is supported by lots and lots of theory that has yet to find its way into practice. My project was to turn the theory into reality. I started with written descriptions of Exported Authenticators, OPAQUE, and a preliminary draft of OPAQUE-in-TLS. My goal was to get from those to a working prototype.

My demo shows the feasibility of implementing OPAQUE-EA on the web, completely removing plaintext passwords from the wire, even encrypted. This provides a possible alternative to the current password-over-TLS flow with better security properties, but no visible change to the user.

A few of the implementation details are worth knowing. In computer science, abstraction is a powerful tool. It means that we can often rely on existing tools and APIs to avoid duplication of effort. In my project I relied heavily on mint, an open-source implementation of TLS 1.3 in Go that is great for prototyping. I also used CIRCL’s OPRF API. I built libraries for Exported Authenticators, the core of OPAQUE, and OPAQUE-EA (which ties together the two).

I made the web demo by wrapping the OPAQUE-EA functionality in a simple HTTP server and client that pass messages to each other over HTTPS. Since a browser can’t run Go, I compiled from Go to WebAssembly (WASM) to get the Go functionality in the browser, and wrote a simple script in JavaScript to call the WASM functions needed.

Since current browsers do not give access to the underlying TLS connection on the client side, I had to implement a work-around to allow the client to access the exporter keys, namely, that the server simply computes the keys and sends them to the client over HTTPS. This workaround reduces the security of the resulting demo — it means that trust is placed in the server to provide the right keys. Even so, the user’s password is still safe, even if a malicious server provided bad keys— they just don’t have assurance that they actually previously registered with that server. However, in the future, browsers could include a mechanism to support exported keys and allow OPAQUE-EA to run with its full security properties.

You can explore my implementation on Github, and even follow the instructions to spin up your own OPAQUE-EA test server and client. I’d like to stress, however, that the implementation is meant as a proof-of-concept only, and must not be used for production systems without significant further review.

OPAQUE-EA Limitations

Despite its great properties, there will definitely be some hurdles in bringing OPAQUE-EA from a proof-of-concept to a fully fledged authentication mechanism.

Browser support for TLS exporter keys. As mentioned briefly before, to run OPAQUE-EA in a browser, you need to access secrets from the TLS connection called exporter keys. There is no way to do this in the current most popular browsers, so support for this functionality will need to be added.

Overhauling password databases. To adopt OPAQUE-EA, servers need not only to update their password-checking logic, but also completely overhaul their password databases. Because OPAQUE relies on special password representations that can only be generated interactively, existing salted hashed passwords cannot be automatically updated to OPAQUE records. Servers will likely need to run a special OPAQUE registration flow on a user-by-user basis. Because OPAQUE relies on buy-in from both the client and the server, servers may need to support the old method for a while before all clients catch up.

Reliance on emerging standards. OPAQUE-EA relies on OPRFs, which is in the process of standardization, and Exported Authenticators, a proposed standard. This means that support for these dependencies is not yet available in most existing cryptographic libraries, so early adopters may need to implement these tools themselves.

Summary

As long as people still use passwords, we’d like to make the process as secure as possible. Current methods rely on the risky practice of handling plaintext passwords on the server side while checking their correctness. PAKEs, and (specifically aPAKEs) allow secure password login without ever letting the server see the passwords.

OPAQUE is also being explored within other companies. According to Kevin Lewi, a research scientist from the Novi Research team at Facebook, they are “excited by the strong cryptographic guarantees provided by OPAQUE and are actively exploring OPAQUE as a method for further safeguarding credential-protected fields that are stored server-side.”

OPAQUE is one of the best aPAKEs out there, and can be fully integrated into TLS. You can check out the core OPAQUE implementation here and the demo TLS integration here. A running version of the demo is also available here. A Typescript client implementation of OPAQUE is coming soon. If you're interested in implementing the protocol, or encounter any bugs with the current implementation, please drop us a line at ask-research@cloudflare.com! Consider also subscribing to the IRTF CFRG mailing list to track discussion about the OPAQUE specification and its standardization.

[1] Bellovin, S. M., and Merritt, M. “Encrypted key exchange: Password-based protocols secure against dictionary attacks.” In Proc. IEEE Computer Society Symposium on Research in Security and Privacy (Oakland, May 1992), pp. 72–84.

Pwned Passwords Padding (ft. Lava Lamps and Workers)

Junade Ali — Wed, 04 Mar 2020 13:00:00 GMT

The Pwned Passwords API (part of Troy Hunt’s Have I Been Pwned service) is used tens of millions of times each day, to alert users if their credentials are breached in a variety of online services, browser extensions and applications. Using Cloudflare, the API cached around 99% of requests, making it very efficient to run.

From today, we are offering a new security advancement in the Pwned Passwords API - API clients can receive responses padded with random data. This exists to effectively protect from any potential attack vectors which seek to use passive analysis of the size of API responses to identify which anonymised bucket a user is querying. I am hugely grateful to security researcher Matt Weir who I met at PasswordsCon in Stockholm and has explored proof-of-concept analysis of unpadded API responses in Pwned Passwords and has driven some of the work to consider the addition of padded responses.

Now, by passing a header of “Add-Padding” with a value of “true”, Pwned Passwords API users are able to request padded API responses (to a minimum of 800 entries with additional padding of a further 0-200 entries). The padding consists of randomly generated hash suffixes with the usage count field set to “0”.

Clients using this approach should seek to exclude 0-usage hash suffixes from breach validation. Given most implementations of PwnedPasswords simply do string matching on the suffix of a hash, there is no real performance implication of searching through the padding data. The false positive risk if a hash suffix matches a randomly generated response is very low, 619/(2^35*4) ≈ 4.44 x 10^-40. This means you’d need to do about 10⁴⁰ queries (roughly a query for every two atoms in the universe) to have a 44.4% probability of a collision.

In the future, non-padded responses will be deprecated outright (and all responses will be padded) once clients have had a chance to update.

You can see an example padded request by running the following curl request:

curl -H Add-Padding:true https://api.pwnedpasswords.com/range/FFFFF

API Structure

The high level structure of the Pwned Passwords API is discussed in my original blog post “Validating Leaked Passwords with k-Anonymity”. In essence, a client queries the API for the first 5 hexadecimal characters of a SHA-1 hashed password (amounting to 20 bits), a list of responses is returned with the remaining 35 hexadecimal characters of the hash (140 bits) of every breached password in the dataset. Each hash suffix is appended with a colon (“:”) and the number of times that given hash is found in the breached data.

An example query for FFFFF can be seen below, with the structure represented:

Without padding, the message length varies given the amount of hash suffixes in the bucket that is queried. It is known that it is possible to fingerprint TLS traffic based on the encrypted message length - fortunately this padding can be inserted in the API responses themselves (in the HTTP content). We can see the difference in download size between two unpadded buckets by running:

$ curl -so /dev/null https://api.pwnedpasswords.com/range/E0812 -w '%{size_download} bytes\n'
17022 bytes
$ curl -so /dev/null https://api.pwnedpasswords.com/range/834EF -w '%{size_download} bytes\n'
25118 bytes

The randomised padded entries can be found with with the “:0” suffix (indicating usage count); for example, below the top three entries are real entries whilst the last 3 represent padding data:

FF1A63ACC70BEA924C5DBABEE4B9B18C82D:10
FF8A0382AA9C8D9536EFBA77F261815334D:12
FFEE791CBAC0F6305CAF0CEE06BBE131160:2
2F811DCB8FF6098B838DDED4D478B0E4032:0
A1BABA501C55ACB6BDDC6D150CF585F20BE:0
9F31397459FF46B347A376F58506E420A58:0

Compression and Randomisation

Cloudflare supports both GZip and Brotli for compression. Compression benefits the PwnedPasswords API as responses are hexadecimal represented in ASCII. That said, compression is somewhat limited given the Avalanche Effect in hashing algorithms (that a small change in an input results in a completely different hash output) - each range searched has dramatically different input passwords and the remaining 35 characters of the SHA-1 hash are similarly different and have no expected similarity between them.

Accordingly, if one were to simply pad messages with null messages (say “000...”), the compression could mean that values padded to the same could be differentiated after compression. Similarly, even without compression, padding messages with the same data could still yield credible attacks.

Accordingly, padding is instead generated with randomly generated entries. In order to not break clients, such padding is generated to effectively look like legitimate hash suffixes. It is possible, however, to identify such messages as randomised padding. As the PwnedPasswords API contains a count field (distinguished by a colon after the remainder of the hex followed by a numerical count), randomised entries can be distinguished with a 0 usage.

Lava Lamps and Workers

I’ve written before about how cache optimisation of Pwned Passwords (including using Cloudflare Workers). Cloudflare Workers has an additional benefit that Workers run before elements are pulled from cache.

This allows for randomised entries to be generated dynamically on a request-to-request basis instead of being cached. This means the resulting randomised padding can differ from request-to-request (thus the amount of entries in a given response and the size of the response).

Cloudflare Workers supports the Web Crypto API, providing for exposure of a cryptographically sound random number generator. This random number generator is used to decide the variable amount of padding added to each response. Whilst a cryptographically secure random number generator is used for determining the amount of padding, as the random hexadecimal padding does not need to be indistinguishable from the real hashes, for computational performance we use the non-cryptographically secure Math.random() to generate the actual content of the padding.

Famously, one of the sources of entropy used in Cloudflare servers is sourced from Lava Lamps. By filming a wall of lava lamps in our San Francisco office (with individual photoreceptors picking up on random noise beyond the movement of the lava), we are able to generate random seed data used in servers (complimented by other sources of entropy along the way). This lava lamp entropy is used alongside the randomness sources on individual servers. This entropy is used to seed cryptographically secure pseudorandom number generators (CSPRNG) algorithms when generating random numbers. Cloudflare Workers runtime uses the v8 engine for JavaScript, with randomness sourced from /dev/urandom on the server itself.

Each response is padded to a minimum of 800 hash suffixes and a randomly generated amount of additional padding (from 200 entries).

This can be seen in two ways, firstly we can see that repeating the same responses to the same endpoint (with the underlying response being cached), yields a randomised amount of lines between 800 and 1000:

$ for run in {1..10}; do curl -s -H Add-Padding:true https://api.pwnedpasswords.com/range/FFFFF | wc -l; done
     831
     956
     870
     980
     932
     868
     856
     961
     912
     827

Secondly, we can see a randomised download size in each response:

$ for run in {1..10}; do curl -so /dev/null -H Add-Padding:true https://api.pwnedpasswords.com/range/FFFFF -w '%{size_download} bytes\n'; done
35572 bytes
37358 bytes
38194 bytes
33596 bytes
32304 bytes
37168 bytes
32532 bytes
37928 bytes
35154 bytes
33178 bytes

Future Work and Conclusion

There has been a considerable amount of research that has complemented the anonymity approach in Pwned Passwords. For example; Google and Stanford have written a paper about their approach implemented in Google Password Checkup, “Protecting accounts from credential stuffing with password breach alerting” [Usenix].

We have done a significant amount of work exploring more advanced protocols for Pwned Passwords, some of this work can be found in a paper we worked on with academics at Cornell University, “Protocols for Checking Compromised Credentials” [ACM or arXiv preprint]. This research offers two new protocols (FSB, frequency smoothing bucketization, and IDB, identifier-based bucketization) to further reduce information leakage in the APIs.

Further work is needed before these protocols gain the production worthiness that we’d like before they are shipped - but, as always, we’ll keep you updated here on our blog.

Banking-Grade Credential Stuffing: The Futility of Partial Password Validation

Junade Ali — Thu, 20 Dec 2018 13:00:00 GMT

Recently when logging into one of my credit card providers, I was greeted by a familiar screen. After entering in my username, the service asked me to supply 3 random characters from my password to validate ownership of my account.

It is increasingly common knowledge in the InfoSec community that this practice is the antithesis of, what we now understand to be, secure password management.

For starters; sites prompting you for Partial Password Validation cannot store your passwords securely using algorithms like BCrypt or Argon2. If the service provider is ever breached, such plain-text passwords can be used to login to other sites where the account holder uses the same password (known as a Credential Stuffing attack).

Increased difficulty using long, randomly-generated passwords from Password Managers, leads to users favouring their memory over securely generated unique passwords. Those using Password Managers must extract their password from their vault, paste it somewhere else and then calculate the correct characters to put in. With this increased complexity, it further incentivises users to (re-)use simple passwords they can remember and count off on their fingers (and likely repeatedly use on other sites).

This is not to distinct thinking that originally bought us complex password composition rules, which would also ultimately incentivise password re-use. I have previously written how broken password management policies came about and the collaborative solution I worked on with security researcher, Troy Hunt, to dissuade password reuse (examples of this implementation can be found on Troy’s blog).

However, in this blog post I want us to follow through some of the logic of Partial Password Validation that is used by so many websites, including banks and services which certainly contain sensitive data. Let’s see how effective this strategy really is.

How secure is partial password data?

In one data breach, we saw that 86% of users were using passwords already in data breaches. So, if we have certain characters of a password, how easy is it to identify that password from a data breach?

So; how much information can 3 characters provide us about a password a user logs in with (assuming such a password is breached)?

By selecting 10,000 passwords randomly from a public database of 488,129 breached passwords (after excluding passwords less than 8 characters long), I selected 3 random characters from each password and saw how many passwords in the complete dataset had the same characters in the same place as the original password. This simulation was run 11 times per password considered.

In the 110,000 simulations, 58% of tested 3-character password segments were only valid for that password in the database. Additionally, 28% could be associated with only one other password and 8% with two other passwords.

As such, we can demonstrate that in this database, the presence of only 3 characters of a password is sufficient to let us breach a significant proportion of such accounts. Using slow brute force attacks, hundreds of passwords can be checked against an account before triggering any form of rate limiting (if any), combining password segments with common password databases allows us to bring down the search candidates substantially.

Research for these tactics exists outside the scope of simulations alone. Prior work by the University of Edinburgh and the University of Aberdeen in a paper on Partial Password implementations and attacks, created a password projection dictionary using a larger data breach (RockYou) alongside recording credentials during login attempts to user accounts. Result collection was based on a survey of user logins to online banking sites and showed a worryingly high success rate of being able to predict solutions to challenge pages (in particular, against banking websites which required no, or a weak, second credential):

We find that with 6 guesses, an attacker can respond correctly to 2-place challenges on 6-digit PINs with a success rate of 30%. Recording up to 4 runs, an attacker can succeed over 60% of the time, or by combining guessing and recording, over 90%. Alphanumeric passwords do somewhat better: responding to 3-place challenges on 8-character alphanumeric passwords, with up to 10 guesses, the attacker can achieve a success rate of 5.5%. Combining guessing and recording increases that to 25% with one recorded run and at least 80% with four runs.

Partial Password Validation arguably exists to prevent key loggers (either in software or hardware form) from learning your complete password; however now let's put this hypothesis to the test next.

Getting the full password

Every time my banking site asks for a set of random characters of my password on login, someone who knows my keystrokes can learn even more. With personal computer ownership at an all time high (e.g. household computer ownership at 88% in the UK) it is increasingly likely such users will repeatedly login to their services using a single device.

Let's consider the threat model of whether Partial Password Validation actually helps. How many times would a user need to repeatedly login until their password is fully breached? This problem represents a modified form of the coupon collector's problem (except with multiple selections), and as such approximations can be calculated in mathematical terms; but instead, here I'll demonstrate this using real password data and simulations.

It is trivial to calculate how many possible ways 3 random characters can be selected from a password using binomial coefficients; there are 220 ways of selecting 3 characters from a 12 character password, but it is fundamentally important to know that, at the end of the day, you only need 12 characters to breach a password. Every time a user is prompted for more random characters, we have more chance of finding out what their password is.

By repeatedly taking the lengths of random passwords (over 8 characters in lengths), I ran a simulation to see how many times it would take for the entire password to become known. 110,000 simulations were run collectively on 10,000 passwords.

In simulation, 58% of passwords are revealed in entirety after 7 logins, 90% after 12 and 99% after 19 logins. These results would be even more garish, with a maximum password length constraint (and/or including passwords less than 8 characters); as is implemented by many online banks using such an approach.

The Road to Multi-Factor Authentication

There we have it; there is no substantive keylogger protection from Partial Password Validation; all you get is a guarantee that the service you're dealing with isn't capable (or is too lazy) to properly secure your passwords.

Password Managers are a blessing and they provide a huge boost in security by allowing users to set strong, unique passwords on a per site basis and Partial Password Validation poses a massive usability frontier to users adopting such practices. In the words of the British National Cyber Security Centre: "Let them paste passwords".

So what do you do instead? Instead of merely using what your customers know to secure their accounts, additionally use something they have. The industry standard for key logger protection is Two Factor Authentication (or Multi Factor Authentication); using either apps on your users smartphones to generate cryptographically secure tokens, or, as is increasingly the case of banks, giving your customers hardware Two Factor token generators to validate their login attempts. Hardware token generators provide a distinct piece of hardware to generate secure tokens, that users cannot use to login to their online accounts.

Increasingly SMS is used to send a user a password to validate their login attempt, in addition to a password the users enters. SMS isn't technically Second Factor Authentication, and instead represents a One Time Password. This can also be said for users logging in to a site, using a password, on the same device they use to generate their one time token. Additionally, SMS tokens are not cryptographically generated on the device itself, using the RFC 6238 standard that's employed by 2FA apps, but are sent over a protocol with numerous security shortcomings. In practice, this means that interception can reveal the One Time Password that you receive.

Whilst there are better alternatives to SMS, and indeed users who are educated to use such tactics should be allowed to do so, there is a usability gap to setting up Two Factor Authentication and somewhat of a cost and operational barrier to issuing users hardware devices. However; in a time where the vast majority of users guard their secure accounts using nothing more than their passwords, it is somewhat excusable to default opt-in users to SMS One Time Passwords, whilst providing the option for Two Factor Authentication if desired. The UK tax office (Her Majesty's Revenue & Customs) have gone down this route, requiring users to set-up some form of Multi-Step Verification before being able to proceed with managing their accounts; either via SMS, landline or an authenticator app.

In conclusion; whilst SMS isn't the most secure way of generating a One Time Password, it offers huge advantages over a user only using a password. Any form of One Time Password is better than none at all, and it provides a path for users to handle the security threats that Partial Password Verification promises, but falls short on.

Want to help businesses, small and big, protect their users from security threats, small and large? The Cloudflare technical support team is hiring.

Leave your VPN and cURL secure APIs with Cloudflare Access

Sam Rhea — Fri, 05 Oct 2018 18:30:07 GMT

We built Access to solve a problem here at Cloudflare: our VPN. Our team members hated the slowness and inconvenience of VPN but, that wasn’t the issue we needed to solve. The security risks posed by a VPN required a better solution.

VPNs punch holes in the network perimeter. Once inside, individuals can access everything. This can include critically sensitive content like private keys, cryptographic salts, and log files. Cloudflare is a security company; this situation was unacceptable. We need a better method that gives every application control over precisely who is allowed to reach it.

Access meets that need. We started by moving our browser-based applications behind Access. Team members could connect to applications faster, from anywhere, while we improved the security of the entire organization. However, we weren’t yet ready to turn off our VPN as some tasks are better done through a command line. We cannot #EndTheVPN without replacing all of its use cases. Reaching a server from the command line required us to fall back to our VPN.

Today, we’re releasing a beta command line tool to help your team, and ours. Before we started using this feature at Cloudflare, curling a server required me to stop, find my VPN client and credentials, login, and run my curl command. With Cloudflare’s command line tool, cloudflared, and Access, I can run $ cloudflared access curl https://example.com/api and Cloudflare authenticates my request to the server. I save time and the security team at Cloudflare can control who reaches that endpoint (and monitor the logs).

Protect your API with Cloudflare Access

To protect an API with Access, you’ll follow the same steps that you use to protect a browser-based application. Start by adding the hostname where your API is deployed to your Cloudflare account.

Just like web applications behind Access, you can create granular policies for different paths of your HTTP API. Cloudflare Access will evaluate every request to the API for permission based on settings you configure. Placing your API behind Access means requests from any operation, CLI or other, will continue to be gated by Cloudflare. You can continue to use your API keys, if needed, as a second layer of security.

Reach a protected API

Cloudflare Access protects your application by checking for a valid JSON Web Token (JWT), whether the request comes through a browser or from the command line. We issue and sign that JWT when you successfully login with your identity provider. That token contains claims about your identity and session. The Cloudflare network looks at the claims in that token to determine if the request should proceed to the target application.

When you use a browser with Access, we redirect you to your identity provider, you login, and we store that token in a cookie. Authenticating from the command line requires a different flow, but relies on the same principles. When you need to reach an application behind Access from your command line, the Cloudflare CLI tool, cloudflared, launches a browser window so that you can login with your identity provider. Once you login, Access will generate a JWT for your session, scoped to your user identity.

Rather than placing that JWT in a cookie, Cloudflare transfers the token in a cryptographically secure handoff to your machine. The client stores the token for you so that you don’t need to re-authenticate each time. The token is valid for the session duration as configured in Access.

When you make requests from your command line, Access will look for an HTTP header, cf-access-token, instead of a cookie. We’ll evaluate the token in that header and on every request. If you use cURL, we can help you move even faster. cloudflared includes a subcommand that wraps cURL and injects the JWT into the header for you.

Why use cloudflared to reach your application?

With cloudflared and its cURL wrapper, you can perform any cURL operation against an API protected by Cloudflare Access.

Control endpoint access for specific usersCloudflare Access can be configured to protect specific endpoints. For example, you can create a rule that only a small group within your team can reach a particular URL path. You can apply that granular protection to sensitive endpoints so that you control who can reach those, while making other parts of the tool available to the full team.
Download sensitive dataPlacing applications with sensitive data behind Access lets you control who can reach that information. If a particular file is stored at a known location, you can save time by downloading it to your machine from the command line instead of walking through the UI flow.

What's next?

CLI authentication is available today to all Access customers through the cloudflared tool. Just add the API hostname to your Cloudflare account and enable Access to start building policies that control who can reach that API. If you do not have an Access subscription yet, you can read more about the plans here and sign up.

Once you’re ready to continue ditching your VPN, follow this link to install cloudflared today. The tool is in beta and does not yet support automated scripting or service-to-service connections. Full instructions and known limitations can be found here. If you are interested in providing feedback, you can post your comments in this thread.

Using Cloudflare Workers to identify pwned passwords

John Graham-Cumming — Mon, 26 Feb 2018 12:04:56 GMT

Last week Troy Hunt launched his Pwned Password v2 service which has an API handled and cached by Cloudflare using a clever anonymity scheme.

The following simple code can check if a password exists in Troy's database without sending the password to Troy. The details of how it works are found in the blog post above.

use strict;
use warnings;

use LWP::Simple qw/$ua get/;
$ua->agent('Cloudflare Test/0.1');
use Digest::SHA1 qw/sha1_hex/;

uc(sha1_hex($ARGV[0]))=~/^(.{5})(.+)/;
print get("https://api.pwnedpasswords.com/range/$1")=~/$2/?'Pwned':'Ok', "\n";

It's just as easy to implement the same check in other languages, such as JavaScript, which made me realize that I could incorporate the check into a Cloudflare Worker. With a little help from people who know JavaScript far better than me, I wrote the following Worker:

addEventListener('fetch', event => {
  event.respondWith(fetchAndCheckPassword(event.request))
})

async function fetchAndCheckPassword(req) {
  if (req.method == "POST") {
    try {
      const post = await req.formData()
      const pwd = post.get('password')
      const enc = new TextEncoder("utf-8").encode(pwd)

      let hash = await crypto.subtle.digest("SHA-1", enc)
      let hashStr = hex(hash).toUpperCase()
      
      const prefix = hashStr.substring(0, 5)
      const suffix = hashStr.substring(5)

      const pwndpwds = await fetch('https://api.pwnedpasswords.com/range/' + prefix)
      const t = await pwndpwds.text()
      const pwnd = t.includes(suffix)

      let newHdrs = new Headers(req.headers)
      newHdrs.set('Cf-Password-Pwnd', pwnd?'YES':'NO')

      const init = {
        method: 'POST',
        headers: newHdrs,
        body: post
      }

      return await fetch(req.url, init)    
    } catch (err) {
      return new Response('Internal Error')
    }
  }
  
  return await fetch(req)
}

function hex(a) {
  var h = ""
  var b = new Uint8Array(a)
  for(var i = 0; i < b.length; i++){
    var hi = b[i].toString(16)
    h += hi.length === 1?"0"+hi:hi
  }
  return h
}

This Worker can be used to intercept a request passing through Cloudflare to a Cloudflare site. It looks at POST requests and extracts a field called password and checks it against Troy Hunt's service.

It then adds an HTTP request header, Cf-Password-Pwned, with either the value YES or NO depending on whether the password being handled is found in the database or not.

The POST request is then passed on to the origin server for handling, with the extra header inserted. This could, for example, be used on a signup page to check whether the password a user is hoping to use has already been found in a leak. The server would simply look at the header.

Clearly, this code isn't completely production ready (it does a bad job of handling failure, for example), but it gives a good idea of the power of a Cloudflare Worker to perform a subrequest to an API as part of normal request processing by Cloudflare and augment at request with information.

Trying it out

To test it out I created a simple page that just returns the received HTTP request headers as text and deployed this as a 'signup' page with the Worker code above routed to it.

And checked the a simple GET request was not handled by the Worker (notice that the Cf-Password-Pwned header is not present.

$ curl https://signup.example.com
Host: signup.example.com
Connection: Keep-Alive
Accept-Encoding: gzip
Cf-Ipcountry: US
Cf-Ray: 3f329308132f92b8-SJC
X-Forwarded-Proto: https
Cf-Visitor: {"scheme":"https"}
Accept: */*
User-Agent: curl/7.26.0

But a POST request with a password results in an extra header. Clearly no one should be using the password 12345.

$ curl -X POST -d 'password=12345' https://signup.example.com
Host: signup.example.com
Connection: Keep-Alive
Accept-Encoding: gzip
Cf-Ipcountry: US
Cf-Ray: 3f3294e714a36d42-SJC
Content-Length: 130
X-Forwarded-Proto: https
Cf-Visitor: {"scheme":"https"}
Content-Type: application/x-www-form-urlencoded
Accept: */*
Cf-Password-Pwnd: YES
User-Agent: curl/7.26.0

But it looks like the password kRc4qMwAtexDVZVygPnSt7LP5jPFsUDt is safe for the time being:

$ curl -X POST -d 'password=kRc4qMwAtexDVZVygPnSt7LP5jPFsUDt' https://signup.example.com
Host: signup.example.com
Connection: Keep-Alive 
Accept-Encoding: gzip
Cf-Ipcountry: US
Cf-Ray: 3f329675e7f29625-SJC
Content-Length: 157
X-Forwarded-Proto: https
Cf-Visitor: {"scheme":"https"}
Content-Type: application/x-www-form-urlencoded
Accept: */*
Cf-Password-Pwnd: NO
User-Agent: curl/7.26.0

The power of Cloudflare Workers comes from the ability to run standard JavaScript written against the Service Workers API on Cloudflare's edge nodes around the world. Small snippets of code can be used to transform and enhance requests and responses, build responses from multiple API calls, and interact with the Cloudflare cache.

Read more in the developer documentation.

If you have a worker you'd like to share, or want to check out workers from other Cloudflare users, visit the “Recipe Exchange” in the Workers section of the Cloudflare Community Forum.

Validating Leaked Passwords with k-Anonymity

Junade Ali — Wed, 21 Feb 2018 19:00:44 GMT

Today, v2 of Pwned Passwords was released as part of the Have I Been Pwned service offered by Troy Hunt. Containing over half a billion real world leaked passwords, this database provides a vital tool for correcting the course of how the industry combats modern threats against password security.

I have written about how we need to rethink password security and Pwned Passwords v2 in the following post: How Developers Got Password Security So Wrong. Instead, in this post I want to discuss one of the technical contributions Cloudflare has made towards protecting user information when using this tool.

Cloudflare continues to support Pwned Passwords by providing CDN and security functionality such that the data can easily be made available for download in raw form to organisations to protect their customers. Further, as part of the second iteration of this project, I have also worked with Troy on designing and implementing API endpoints that support anonymised range queries to function as an additional layer of security for those consuming the API, that is visible to the client.

This contribution allows for Pwned Passwords clients to use range queries to search for breached passwords, without having to disclose a complete unsalted password hash to the service.

Getting Password Security Right

Over time, the industry has realised that complex password composition rules (such as requiring a minimum number of special characters) have done little to improve user behaviour in making stronger passwords; they have done little to prevent users from putting personal information in passwords, avoiding common passwords or prevent the use of previously breached passwords[1]. Credential Stuffing has become a real threat recently; usernames and passwords are obtained from compromised websites and then injected into other websites until you find user accounts that are compromised.

This fundamentally works because users reuse passwords across different websites; when one set of credentials is breached on one site, this can be reused on other websites. Here are some examples of how credentials can be breached from insecure websites:

Websites which don't use rate limiting or challenge login requests can have a user's log-in credentials breached using brute force attacks of common passwords for a given user,
database dumps from hacked websites can be taken offline and the password hashes can be cracked; modern GPUs make this very efficient for dictionary passwords (even with algorithms like Argon2, PBKDF2 and BCrypt),
many websites continue not to use any form of password hashing, once breached they can be captured in raw form,
Proxy Attacks or hijacking a web server can allow for capturing passwords before they're hashed.

This becomes a problem with password reuse; having obtained real life username/password combinations, they can be injected into other websites (such as payment gateways, social networks, etc) until access is obtained to more accounts (often of a higher value than the original compromised site).

Under recent NIST guidance, it is a requirement, when storing or updating passwords, to ensure they do not contain values which are commonly used, expected or compromised[2]. Research has found that 88.41% of users who received a fear appeal later set unique passwords, whilst only 4.45% of users who did not receive a fear appeal would set a unique password[3].

Unfortunately, there are a lot of leaked passwords out there; the downloadable raw data from Pwned Passwords currently contains over 30 GB in password hashes.

Anonymising Password Hashes

The key problem in checking passwords against the old Pwned Passwords API (and all similar services) lies in how passwords are checked; with users being effectively required to submit unsalted hashes of passwords to identify if the password is breached. The hashes must be unsalted, as salting them makes them computationally difficult to search quickly.

Currently there are two choices that are available for validating whether a password is or is not leaked:

Submit the password (in an unsalted hash) to a third-party service, where the hash can potentially be stored for later cracking or analysis. For example, if you make an API call for a leaked password to a third-party API service using a WordPress plugin, the IP of the request can be used to identify the WordPress installation and then breach it when the password is cracked (such as from a later disclosure); or,
download the entire list of password hashes, uncompress the dataset and then run a search to see if your password hash is listed.

Needless to say, this conflict can seem like being placed between a security-conscious rock and an insecure hard place.

The Middle Way

The Private Set Intersection (PSI) Problem

Academic computer scientists have considered the problem of how two (or more) parties can validate the intersection of data (from two or more unequal sets of data either side already has) without either sharing information about what they have. Whilst this work is exciting, unfortunately these techniques are new and haven't been subject to long-term review by the cryptography community and cryptographic primitives have not been implemented in any major libraries. Additionally (but critically), PSI implementations have substantially higher overhead than our k-Anonymity approach (particularly for communication[4]). Even the current academic state-of-the-art is not with acceptable performance bounds for an API service, with the communication overhead being equivalent to downloading the entire set of data.

k-Anonymity

Instead, our approach adds an additional layer of security by utilising a mathematical property known as k-Anonymity and applying it to password hashes in the form of range queries. As such, the Pwned Passwords API service never gains enough information about a non-breached password hash to be able to breach it later.

k-Anonymity is used in multiple fields to release anonymised but workable datasets; for example, so that hospitals can release patient information for medical research whilst withholding information that discloses personal information. Formally, a data set can be said to hold the property of k-Anonymity, if for every record in a released table, there are k − 1 other records identical to it.

By using this property, we are able to seperate hashes into anonymised "buckets". A client is able to anonymise the user-supplied hash and then download all leaked hashes in the same anonymised "bucket" as that hash, then do an offline check to see if the user-supplied hash is in that breached bucket.

In more concrete terms:

In essence, we turn the table on password derivation functions; instead of seeking to salt hashes to the point at which they are unique (against identical inputs), we instead introduce ambiguity into what the client is requesting.

Given hashes are essentially fixed-length hexadecimal values, we are able to simply truncate them, instead of having to resort to a decision tree structure to filter down the data. This does mean buckets are of unequal sizes but allows clients to query in a single API request.

This approach can be implemented in a trivial way. Suppose a user enters the password test into a login form and the service they’re logging into is programmed to validate whether their password is in a database of leaked password hashes. Firstly the client will generate a hash (in our example using SHA-1) of a94a8fe5ccb19ba61c4c0873d391e987982fbbd3. The client will then truncate the hash to a predetermined number of characters (for example, 5) resulting in a Hash Prefix of a94a8. This Hash Prefix is then used to query the remote database for all hashes starting with that prefix (for example, by making a HTTP request to example.com/a94a8.txt). The entire hash list is then downloaded and each downloaded hash is then compared to see if any match the locally generated hash. If so, the password is known to have been leaked.

As this can easily be implemented over HTTP, client side caching can easily be used for performance purposes; the API is simple enough for developers to implement with little pain.

Below is a simple Bash implementation of how the Pwned Passwords API can be queried using range queries (Gist):

#!/bin/bash

echo -n Password:
read -s password
echo
hash="$(echo -n $password | openssl sha1)"
upperCase="$(echo $hash | tr '[a-z]' '[A-Z]')"
prefix="${upperCase:0:5}"
response=$(curl -s https://api.pwnedpasswords.com/range/$prefix)
while read -r line; do
  lineOriginal="$prefix$line"
  if [ "${lineOriginal:0:40}" == "$upperCase" ]; then
    echo "Password breached."
    exit 1
  fi
done <<< "$response"

echo "Password not found in breached database."
exit 0

Implementation

Hashes (even in unsalted form) have two useful properties that are useful in anonymising data.

Firstly, the Avalanche Effect means that a small change in a hash results in a very different output; this means that you can't infer the contents of one hash from another hash. This is true even in truncated form.

For example; the Hash Prefix 21BD1 contains 475 seemingly unrelated passwords, including:

lauragpe
alexguo029
BDnd9102
melobie
quvekyny

Further, hashes are fairly uniformally distributed. If we were to count the original 320 million leaked passwords (in Troy's dataset) by the first hexadecimal charectar of the hash, the difference between the hashes associated to the largest and the smallest Hash Prefix is ≈ 1%. The chart below shows hash count by their first hexadecimal digit:

Algorithm 1 provides us a simple check to discover how much we should truncate hashes by to ensure every "bucket" has more than one hash in it. This requires every hash to be sorted by hexadecimal value. This algorithm, including an initial merge sort, runs in roughly O(n log n + n) time (worst-case):

After identifying the Maximum Hash Prefix length, it is fairly easy to seperate the hashes into seperate buckets, as described in Algorithm 3:

This implementation was originally evaluated on a dataset of over 320 million breached passwords and we find the Maximum Prefix Length that all hashes can be truncated to, whilst maintaining the property k-anonymity, is 5 characters. When hashes are grouped together by a Hash Prefix of 5 characters, we find the median number of hashes associated with a Hash Prefix is 305. With the range of response sizes for a query varying from 8.6KB to 16.8KB (a median of 12.2KB), the dataset is usable in many practical scenarios and is certainly a good response size for an API client.

On the new Pwned Password dataset (with over half a billion) passwords and whilst keeping the Hash Prefix length 5; the average number of hashes returned is 478 - with the smallest being 381 (E0812 and E613D) and the largest Hash Prefix being 584 (00000 and 4A4E8).

Splitting the hashes into buckets by a Hash Prefix of 5 would mean a maximum of 16^5 = 1,048,576 buckets would be utilised (for SHA-1), assuming that every possible Hash Prefix would contain at least one hash. In the datasets we found this to be the case and the amount of distinct Hash Prefix values was equal to the highest possible quantity of buckets. Whilst for secure hashing algorithms it is computationally inefficient to invert the hash function, it is worth noting that as the length of a SHA-1 hash is a total of 40 hexadecimal characters long and 5 characters is utilised by the Hash Prefix, the total number of possible hashes associated with a Hash Prefix is 16^{35} ≈ 1.39E42.

Important Caveats

It is important to note that where a user's password is already breached, an API call for a specific range of breached passwords can reduce the search candidates used in a brute-force attack. Whilst users with existing breached passwords are already vulnerable to brute-force attacks, searching for a specific range can help reduce the amount of search candidates - although the API service would have no way of determining if the client was or was not searching for a password that was breached. Using a deterministic algorithm to run queries for other Hash Prefixes can help reduce this risk.

One reason this is important is that this implementation does not currently guarantee l-diversity, meaning a bucket may contain a hash which is of substantially higher use than others. In the future we hope to use percentile-based usage information from the original breached data to better guarantee this property.

For general users, Pwned Passwords is usually exposed via web interface, it uses a JavaScript client to run this process; if the origin web server was hijacked to change the JavaScript being returned, this computation could be removed (and the password could be sent to the hijacked origin server). Whilst JavaScript requests are somewhat transparent to the client (in the case of a developer), this may not be depended on and for technical users, non-web client based requests are preferable.

The original use-case for this service was to be deployed privately in a Cloudflare data centre where our services can use it to enhance user security, and use range queries to complement the existing transport security used. Depending on your risks, it's safer to deploy this service yourself (in your own data centre) and use the k-anonymity approach to validate passwords where services do not themselves have the resources to store an entire database of leaked password hashes.

I would strongly recommend against storing the range queries used by users of your service, but if you do for whatever reason, store them as aggregate analytics such that they cannot be linked back to any given user's password.

Final Thoughts

Going forward, as we test this technology more, Cloudflare is looking into how we can use a private deployment of this service to better offer security functionality, both for log-in requests to our dashboard and for customers who want to prevent against credential stuffing on their own websites using our edge network. We also seek to consider how we can incorporate recent work on the Private Set Interesection Problem alongside considering l-diversity for additional security guarantees. As always; we'll keep you updated right here, on our blog.

Campbell, J., Ma, W. and Kleeman, D., 2011. Impact of restrictive composition policy on user password choices. Behaviour & Information Technology, 30(3), pp.379-388. ↩︎
Grassi, P. A., Fenton, J. L., Newton, E. M., Perlner, R. A., Regenscheid, A. R., Burr, W. E., Richer, J. P., Lefkovitz, N. B., Danker, J. M., Choong, Y.-Y., Greene, K. K., and Theofanos, M. F. (2017). NIST Special Publication 800-63B Digital Identity Guidelines, chapter Authentication and Lifecycle Management. National Institute of Standards and Technology, U.S. Department of Commerce. ↩︎
Jenkins, Jeffrey L., Mark Grimes, Jeffrey Gainer Proudfoot, and Paul Benjamin Lowry. "Improving password cybersecurity through inexpensive and minimally invasive means: Detecting and deterring password reuse through keystroke-dynamics monitoring and just-in-time fear appeals." Information Technology for Development 20, no. 2 (2014): 196-213. ↩︎
De Cristofaro, E., Gasti, P. and Tsudik, G., 2012, December. Fast and private computation of cardinality of set intersection and union. In International Conference on Cryptology and Network Security (pp. 218-231). Springer, Berlin, Heidelberg. ↩︎

How Developers got Password Security so Wrong

Junade Ali — Wed, 21 Feb 2018 19:00:11 GMT

Both in our real lives, and online, there are times where we need to authenticate ourselves - where we need to confirm we are who we say we are. This can be done using three things:

Something you know
Something you have
Something you are

Passwords are an example of something you know; they were introduced in 1961 for computer authentication for a time-share computer in MIT. Shortly afterwards, a PhD researcher breached this system (by being able to simply download a list of unencrypted passwords) and used the time allocated to others on the computer.

As time has gone on; developers have continued to store passwords insecurely, and users have continued to set them weakly. Despite this, no viable alternative has been created for password security. To date, no system has been created that retains all the benefits that passwords offer as researchers have rarely considered real world constraints[1]. For example; when using fingerprints for authentication, engineers often forget that there is a sizable percentage of the population that do not have usable fingerprints or hardware upgrade costs.

Cracking Passwords

In the 1970s, people started thinking about how to better store passwords and cryptographic hashing started to emerge.

Cryptographic hashes work like trapdoors; whilst it's easy to hash a password, it's far harder to turn that "hash" back into the original output (or computationally difficult for an ideal hashing algorithm). They are used in a lot of things from speeding up searching from files, to the One Time Password generators in banks.

Passwords should ideally use specialised hashing functions like Argon2, BCrypt or PBKDF2, they are modified to prevent Rainbow Table attacks.

If you were to hash the password, p4$$w0rd using the SHA-1 hashing algorithm, the output would be 6c067b3288c1b5c791afa04e12fb013ed2e84d10. This output is the same every time the algorithm is run. As a result, attackers are able to create Rainbow Tables which contain the hashes of common passwords and then this information is used to break password hashes (where the password and hash is listed in a Rainbow Table).

Algorithms like BCrypt essentially salt passwords before they hash them using a random string. This random string is stored alongside the password hash and is used to help make the password harder to crack by making the output unique. The hashing process is repeated many times (defined by a difficulty variable), each time adding the random salt onto the output of the hash and rerunning the hash computation.

For example; the BCrypt hash $2a$10$N9qo8uLOickgx2ZMRZoMyeIjZAgcfl7p92ldGxad68LJZdL17lhWy starts with $2a$10$ which indicates the algorithm used is BCrypt and contains a random salt of N9qo8uLOickgx2ZMRZoMye and a resulting hash of IjZAgcfl7p92ldGxad68LJZdL17lhWy. Storing the salt allows the password hash to be regenerated identically when the input is known.

Unfortunately; salting is no longer enough, passwords can be cracked quicker and quicker using modern GPUs (specialised at doing the same task over and over). When a site suffers a security breach, users passwords can be taken offline in database dumps in order to be cracked offline.

Additionally; websites that fail to rate limit login requests or use captchas, can be challenged by Brute Force attacks. For a given user, an attacker will repeatedly try different (but common) passwords until they gain access to a given users account.

Sometimes sites will lock users out after a handful of failed login attempts, attacks can instead be targeted to move on quickly to a new account after the most common set of a passwords has been attempted. Lists like the following (in some cases with many, many more passwords) can be attempted to breach an account:

The industry has tried to combat this problem by requiring password composition rules; requiring users comply to complex rules before setting passwords (requiring a minimum amount of numbers or punctuation symbols). Research has shown that this work hasn't helped combat the problem of password reuse, weak passwords or users putting personal information in passwords.

Credential Stuffing

Whilst it may seem that this is only a bad sign for websites that store passwords weakly, Credential Stuffing makes this problem even worse.

It is common for users to reuse passwords from site to site, meaning a username and password from a compromised website can be used to breach far more important information - like online banking gateways or government logins. When a password is reused - it takes just one website being breached, to gain access to others that a users has credentials for.

This Is Not Fine - The Nib

Fixing Passwords

There are fundamentally three things that need to be done to fix this problem:

Good UX to improve User Decisions
Improve Developer Education
Eliminating reuse of breached passwords

How Can I Secure Myself (or my Users)?

Before discussing the things we're doing, I wanted to briefly discuss what you can do to help protect yourself now. For most users, there are three steps you can immediately take to help yourself.

Use a Password Manager (like 1Password or LastPass) to set random, unique passwords for every site. Additionally, look to enable Two-Factor Authentication where possible; this uses something you have, in addition to the password you know, to validate you. This will mean, alongside your password, you have to enter a short-lived password from a device like your phone before being able to login to any site.

Two-Factor Authentication is supported on many of the worlds most popular social media, banking and shopping sites. You can find out how to enable it on popular websites at turnon2fa.com. If you are a developer, you should take efforts to ensure you support Two Factor Authentication.

Set a secure memorable password for your password manager; and yes, turn on Two-Factor Authentication for it (and keep your backup codes safe). You can find additional security tips (including tips on how to create a secure main password) in my blog post: Simple Cyber Security Tips.

Developers should look to abolish bad practice composition rules (and simplify them as much as possible). Password expiration policies do more harm than good, so seek to do away with them. For further information refer to the blog post by the UK's National Cyber Security Centre: The problems with forcing regular password expiry.

Finally; Troy Hunt has an excellent blog post on passwords for users and developers alike: Passwords Evolved: Authentication Guidance for the Modern Era

Improving Developer Education

Developers should seek to build a culture of security in the organisations where they work; try and talk about security, talk about the benefits of challenging malicious login requests and talk about password hashing in simple terms.

If you're working on an open-source project that handles authentication; expose easy password hashing APIs - for example the password_hash, password_needs_rehash & password_verify functions in modern PHP versions.

Eliminating Password Reuse

We know that complex password composition rules are largely ineffective, and recent guidance has followed suit. A better alternative to composition rules is to block users from signing up with passwords which are known to have been breached. Under recent NIST guidance, it is a requirement, when storing or updating passwords, to ensure they do not contain values which are commonly used, expected or compromised[2].

This is easier said than done, the recent version of Troy Hunt's Pwned Passwords database contains over half a billion passwords (over 30 GB uncompressed). Whilst developers can use API services to check if a password is reused, this requires either sending the raw password, or the password in an unsalted hash. This can be especially problematic when multiple services handle authentication in a business, and each has to store a large quantity of passwords.

This is a problem I've started looking into recently; as part of our contribution to Troy Hunt's Pwned Passwords database, I have designed a range search API that allows developers to check if a password is reused without needing to share the password (even in hashed form) - instead only needing to send a short segment of the cryptographic hash used. You can find more information on this contribution in the post: Validating Leaked Passwords with k-Anonymity.

Version 2 of Pwned Passwords is now available - you can find more information on how it works on Troy Hunt's blog post "I've Just Launched Pwned Passwords, Version 2".

Bonneau, J., Herley, C., Van Oorschot, P.C. and Stajano, F., 2012, May. The quest to replace passwords: A framework for comparative evaluation of web authentication schemes. In Security and Privacy (SP), 2012 IEEE Symposium on (pp. 553-567). IEEE. ↩︎
Grassi, P. A., Fenton, J. L., Newton, E. M., Perlner, R. A., Regenscheid, A. R., Burr, W. E., Richer, J. P., Lefkovitz, N. B., Danker, J. M., Choong, Y.-Y., Greene, K. K., and Theofanos, M. F. (2017). NIST Special Publication 800-63B Digital Identity Guidelines, chapter Authentication and Lifecycle Management. National Institute of Standards and Technology, U.S. Department of Commerce. ↩︎