It's App Dev 101: Don't hard-code API tokens, encryption keys, and user credentials. But if you do, make sure to get them out of your code before committing to GitHub or other public code repositories.
Four years ago, GitHub introduced a search feature that made it easy to find passwords, encryption keys, and other sensitive information within publicly available repositories. The problem hasn't improved; last year, researchers found 1,500 Slack tokens across GitHub projects, which could have been abused by others to gain access to chats, files, and other sensitive data shared within private Slack teams.
Truffle Hog and Git Hound are two examples of available tools that help administrators and developers search for secret keys accidentally leaked through their projects on GitHub. They take different approaches to solve the same problem, but the goal is the same: Help administrators stop cryptographic secrets from being posted to public sites.
Truffle Hog "will go through the entire commit history of each branch, and check each diff from each commit, and evaluate the shannon entropy for both the base64 char set and hexadecimal char set for every blob of text greater than 20 characters comprised of those character sets in each diff," said the tool's developer, Dylan Ayrey. Shannon entropy, named after mathematician Claude E. Shannon, determines randomness, and high entropy would suggest the string is likely in use for cryptographic secrets, such as an access token or a private key. Truffle Hog prints out high-entropy strings that point administrators can investigate to find out what's in the file. Written in Python, Truffle Hog needs only the GitPython library to run.
Git Hound takes a different approach: It uses a Git plugin written in Go to scan files shortly before committing to GitHub. The plugin searches for matches to regular expressions specified in a separate file, .githound.yml, and either prints a warning before allowing the commit, or fails and stops the commit from proceeding. Hound can "sniff changes since last commit and pass to git-commit when clean," said Ezekiel Gabrielse, the tool's developer. While it would be "pretty simple" to set up the check in a pre-commit hook, Gabrielse said the plugin gives more flexibility.
Using regular expressions lets Git Hound handle a broad range of sensitive information, as the list can include credentials, access tokens, and even file and system names. The plugin can be used to sniff changes since the last commit, the whole codebase, or even the entire repository history. Since .githound.yml doesn't get added to the GitHub repository, the regexps stay private.
The timing of the check is important, as Hound sniffs the code before committing to GitHub, putting this important security check into the developer workflow. Security tools that fit in with the developer's workflow are more likely to be used at the right time.
It shouldn't happen, but sensitive keys accidentally getting posted to public code repositories because they were hard-coded inside software projects is too frequent an occurrence. Security researchers found almost 10,000 access keys for Amazon Web Services and Elastic Compute Cloud instances inside publicly accessible GitHub repositories, prompting Amazon to adopt the practice of regularly scanning GitHub for such keys and revoking them before they can be abused.
While it's great that Amazon has taken on this task, many other types of secrets are as likely to get leaked. Truffle Hog and Git Hound let administrators and developers catch the mistakes before they become costly accidents.