Your browser does not support JavaScript!

Author: Tronserve admin

Saturday 8th May 2021 03:31 AM

How Language Shapes Password Security


image cap
147 Views

Irrespective of the differences in language and culture, both Chinese- and English-language Internet users apparently find common ground in using easily guessable password variants of “123456.” But yet a recent study comparing password patterns among the two languages also found notable and unique features in Chinese passwords that have big implications for Internet security beyond China.


The password habits of Chinese-language users have been surprisingly understudied given that they make up more than 20 percent of all Internet users worldwide. At least 854 million people use the Internet in China alone — more than double the entire population of the United States. That's the reason why a group of Chinese and U.S. researchers set out to test how password security among both Chinese- and English-language users stands up against the best cracking algorithms.


“Our work may be among the first studies to examine the passwords of different languages,” says Ding Wang, an information security researcher at Peking University, in Beijing.


Wang and his colleagues analyzed 106 million real passwords from nine Web services — 73 million passwords from six Chinese-language services and 33 million passwords from three English-language services — revealed by hackers and leaked online between 2009 and 2012. They were careful to directly compare the security of passwords only from similar Web service counterparts among the mix of social forums, gaming services, e-commerce websites, and programmer forums, along with the Yahoo Internet portal on the English-language side of the data set. Their outcomes appear in a paper [PDF] presented at the 28th USENIX Security Symposium held in Santa Clara, Calif., from 14 to 16 August.


What may seem like a strong password based on English-language assumptions may very well be quite weak and easy to guess from a Chinese-language perspective. But many of the world’s popular Web services, including some homegrown Chinese services, approach password security from an English-language perspective.  


The experts pointed to the example of the popular Chinese password “woaini1314” that is currently rated “strong” by password strength meters used by AOL, Google, and even the well-known Chinese social network Sina Weibo (and by IEEE Spectrum’s parent organization, IEEE). But speakers of Mandarin Chinese, the most popular spoken dialect of Chinese, can very quickly guess the “woaini1314” password because “woaini” in Chinese pinyin (romanized system of Chinese characters) means “I love you,” and “1314” sounds like “forever” in Chinese.


One major difference between Chinese-language and English-language passwords is that many Chinese-language users favor passwords consisting purely of digits. Beyond the infamous “123456” password, other popular passwords among Chinese-language users include “111111,” “123123,” and “123321.” Playing on the love theme, “5201314” is used because it sounds just like the phrase “I love you forever and ever” in Chinese. Some popular password segments will add a letter to the string of digits, such as “a12345” and “12345a.”


Chinese-language users also often use their mobile phone numbers or certain dates (perhaps their birthdays) in passwords — something that English-language users don’t do as often. Actually, English-language users frequently compose passwords made entirely of letters and lean toward certain words or phrases such as the easily guessable “password,” “letmein,” “sunshine,” and “princess.” Some of the most popular passwords include “abcdef” and “abc123” alongside “123456.”


Passwords that use solely digits are much easier to crack than passwords made only of letters because the digit combinations are based on just 10 possible digits as opposed to 26 letters in the modern English alphabet. But Chinese-language speakers often times demonstrated incredibly complex and creative passwords: Some members of the Chinese Software Developer Network (CSDN) service combined programming language commands with traditional Chinese poems.


“Chinese users can be really creative with combinations of letters and digits,” says Yuan Tian, a computer scientist at the University of Virginia in Charlottesville, Va., and coauthor on the study. 


The password files used by researchers contained hashes of leaked or hacked passwords, not plain-text versions of the passwords themselves. The researchers tried to decode both Chinese-language and English-language passwords using two state-of-the-art algorithms for cracking passwords. They tested the Markov-chain model, which assigns various probabilities to password characters based on their relationships with one another, and the probabilistic context-free grammars (PCFG) model, which parses passwords into letter segments, digit segments, and symbol segments before guessing the order of the most likely combinations.


The team also improved the PCFG approach by adjusting it to account for specific password patterns more common to Chinese-language users. As an example, they added number segments in the popular date format and Chinese names as written in the romanized Pinyin system. They will also gave their PCFG-based algorithm the capability to process the interleaving patterns — strings of changing digits and letters — found in various Chinese passwords.


Together, those efforts boosted the modified PCFG-based algorithm’s performance against the Chinese password data sets — it cracked between 98 percent and 188 percent more passwords than the general version of the algorithm.


The results also underlined primary strengths and weaknesses of Chinese-language passwords in comparison with English-language passwords. Both types of algorithms cracked more of the easier Chinese passwords in comparison with English passwords when limited to 10,000 or much less guess attempts. But the remaining Chinese passwords proved stronger than their English password counterparts as the number of guesses escalated beyond 10,000 attempts.


The number of guesses matters because many Web services limit the number of online guesses before temporarily locking a user’s account. Leaked or stolen password storage files could allow hackers to make a theoretically unlimited amount of offline guessing attacks because they don’t have to cope with possibly being locked out of a Web service. But even offline guess attacks are still limited by the cost-effectiveness of spending computing time and resources on a multitude of guess attempts.


It’s also clear that individual Chinese-language speakers can do themselves a favor by avoiding using predictable digit patterns such as “123456” and “111111” for their passwords, not forgetting the predictable letter and letter/digit hybrid patterns based on romantic themes of eternal love. (The same goes for English-language speakers still using “123456” and “abcdef”—just stop!)


The complexity of language’s influence on passwords may go even further within just the Chinese-language community. Chinese-language users mainly rely upon the same set of Chinese characters for reading and writing, but spoken Chinese has many regional differences based on local dialects that can sound different when it involves pronunciation. As just one example, the pronunciation of “I love you” in Mandarin Chinese — considered mainland China’s official national language — appears different from the pronunciation of the same phrase in the Cantonese branch of Chinese spoken by many people living in or originating from places just like Hong Kong, Macau, and Guangdong.


Those regional distinctions in spoken Chinese were beyond the scope of this special study. But Tian observed that there may just be differences in password patterns if speakers of Cantonese, Hokkien, Shanghainese, or other regional variants of Chinese tried making passwords based on pronunciation.


With regard to a deeper dive, researchers hope to continue evaluating Chinese-language password patterns by using studies to better understand what Chinese Internet users are thinking when creating their passwords. And they raised the possibility of continuing their comparative studies of passwords in different languages beyond just Chinese and English. “For our future work, we want to cover passwords around the world beyond China,” Wang says.


IEEE SPECTRUM


Share this post:


This is the old design: Please remove this section after work on the functionalities for new design

Posted on : Saturday 8th May 2021 03:31 AM

How Language Shapes Password Security


none
Posted by  Tronserve admin
image cap

Irrespective of the differences in language and culture, both Chinese- and English-language Internet users apparently find common ground in using easily guessable password variants of “123456.” But yet a recent study comparing password patterns among the two languages also found notable and unique features in Chinese passwords that have big implications for Internet security beyond China.


The password habits of Chinese-language users have been surprisingly understudied given that they make up more than 20 percent of all Internet users worldwide. At least 854 million people use the Internet in China alone — more than double the entire population of the United States. That's the reason why a group of Chinese and U.S. researchers set out to test how password security among both Chinese- and English-language users stands up against the best cracking algorithms.


“Our work may be among the first studies to examine the passwords of different languages,” says Ding Wang, an information security researcher at Peking University, in Beijing.


Wang and his colleagues analyzed 106 million real passwords from nine Web services — 73 million passwords from six Chinese-language services and 33 million passwords from three English-language services — revealed by hackers and leaked online between 2009 and 2012. They were careful to directly compare the security of passwords only from similar Web service counterparts among the mix of social forums, gaming services, e-commerce websites, and programmer forums, along with the Yahoo Internet portal on the English-language side of the data set. Their outcomes appear in a paper [PDF] presented at the 28th USENIX Security Symposium held in Santa Clara, Calif., from 14 to 16 August.


What may seem like a strong password based on English-language assumptions may very well be quite weak and easy to guess from a Chinese-language perspective. But many of the world’s popular Web services, including some homegrown Chinese services, approach password security from an English-language perspective.  


The experts pointed to the example of the popular Chinese password “woaini1314” that is currently rated “strong” by password strength meters used by AOL, Google, and even the well-known Chinese social network Sina Weibo (and by IEEE Spectrum’s parent organization, IEEE). But speakers of Mandarin Chinese, the most popular spoken dialect of Chinese, can very quickly guess the “woaini1314” password because “woaini” in Chinese pinyin (romanized system of Chinese characters) means “I love you,” and “1314” sounds like “forever” in Chinese.


One major difference between Chinese-language and English-language passwords is that many Chinese-language users favor passwords consisting purely of digits. Beyond the infamous “123456” password, other popular passwords among Chinese-language users include “111111,” “123123,” and “123321.” Playing on the love theme, “5201314” is used because it sounds just like the phrase “I love you forever and ever” in Chinese. Some popular password segments will add a letter to the string of digits, such as “a12345” and “12345a.”


Chinese-language users also often use their mobile phone numbers or certain dates (perhaps their birthdays) in passwords — something that English-language users don’t do as often. Actually, English-language users frequently compose passwords made entirely of letters and lean toward certain words or phrases such as the easily guessable “password,” “letmein,” “sunshine,” and “princess.” Some of the most popular passwords include “abcdef” and “abc123” alongside “123456.”


Passwords that use solely digits are much easier to crack than passwords made only of letters because the digit combinations are based on just 10 possible digits as opposed to 26 letters in the modern English alphabet. But Chinese-language speakers often times demonstrated incredibly complex and creative passwords: Some members of the Chinese Software Developer Network (CSDN) service combined programming language commands with traditional Chinese poems.


“Chinese users can be really creative with combinations of letters and digits,” says Yuan Tian, a computer scientist at the University of Virginia in Charlottesville, Va., and coauthor on the study. 


The password files used by researchers contained hashes of leaked or hacked passwords, not plain-text versions of the passwords themselves. The researchers tried to decode both Chinese-language and English-language passwords using two state-of-the-art algorithms for cracking passwords. They tested the Markov-chain model, which assigns various probabilities to password characters based on their relationships with one another, and the probabilistic context-free grammars (PCFG) model, which parses passwords into letter segments, digit segments, and symbol segments before guessing the order of the most likely combinations.


The team also improved the PCFG approach by adjusting it to account for specific password patterns more common to Chinese-language users. As an example, they added number segments in the popular date format and Chinese names as written in the romanized Pinyin system. They will also gave their PCFG-based algorithm the capability to process the interleaving patterns — strings of changing digits and letters — found in various Chinese passwords.


Together, those efforts boosted the modified PCFG-based algorithm’s performance against the Chinese password data sets — it cracked between 98 percent and 188 percent more passwords than the general version of the algorithm.


The results also underlined primary strengths and weaknesses of Chinese-language passwords in comparison with English-language passwords. Both types of algorithms cracked more of the easier Chinese passwords in comparison with English passwords when limited to 10,000 or much less guess attempts. But the remaining Chinese passwords proved stronger than their English password counterparts as the number of guesses escalated beyond 10,000 attempts.


The number of guesses matters because many Web services limit the number of online guesses before temporarily locking a user’s account. Leaked or stolen password storage files could allow hackers to make a theoretically unlimited amount of offline guessing attacks because they don’t have to cope with possibly being locked out of a Web service. But even offline guess attacks are still limited by the cost-effectiveness of spending computing time and resources on a multitude of guess attempts.


It’s also clear that individual Chinese-language speakers can do themselves a favor by avoiding using predictable digit patterns such as “123456” and “111111” for their passwords, not forgetting the predictable letter and letter/digit hybrid patterns based on romantic themes of eternal love. (The same goes for English-language speakers still using “123456” and “abcdef”—just stop!)


The complexity of language’s influence on passwords may go even further within just the Chinese-language community. Chinese-language users mainly rely upon the same set of Chinese characters for reading and writing, but spoken Chinese has many regional differences based on local dialects that can sound different when it involves pronunciation. As just one example, the pronunciation of “I love you” in Mandarin Chinese — considered mainland China’s official national language — appears different from the pronunciation of the same phrase in the Cantonese branch of Chinese spoken by many people living in or originating from places just like Hong Kong, Macau, and Guangdong.


Those regional distinctions in spoken Chinese were beyond the scope of this special study. But Tian observed that there may just be differences in password patterns if speakers of Cantonese, Hokkien, Shanghainese, or other regional variants of Chinese tried making passwords based on pronunciation.


With regard to a deeper dive, researchers hope to continue evaluating Chinese-language password patterns by using studies to better understand what Chinese Internet users are thinking when creating their passwords. And they raised the possibility of continuing their comparative studies of passwords in different languages beyond just Chinese and English. “For our future work, we want to cover passwords around the world beyond China,” Wang says.


IEEE SPECTRUM

Tags:
password password security privacy language shapes password