Sensitive information on Github

Oguzhan Ozturk
8 min readJun 11, 2021

--

Every day, API credentials, passwords, and customer information are unintentionally posted to GitHub.
Hackers use these keys to get access to servers, steal personal data, and rack up astronomical AWS fees. Leaks on GitHub can cost a business hundreds, if not millions, of dollars in damages.

Someone is looking for your personal data.

Your development team might miss the presence of sensitive information in your repositories, but that doesn’t mean it won’t be discovered by others. In truth, some people have honed their skills in unearthing secrets and other lost gems hidden deep within the archives of public institutions.

The commonly caused data leak errors caused by developers:

  • Embedding hard-coded login credentials in code instead of making them a configuration option on the server the code runs on
  • Using public repositories instead of private repositories
  • Failing to use two-factor or multifactor authentication for email accounts and/or
  • Abandoning repositories instead of deleting them when no longer needed

How to Avoid Leaks on GitHub

  • Forcing password changes periodically
  • Using 2FA or MFA for email accounts
  • Prohibiting the use of public repositories by your developers and requiring the use of private repositories
  • Prohibiting the use of hard-coded login credentials in repositories

Top 9 Github Bugbounty tools according to spectralops:

1. gitLeaks

gitLeaks is an open-source static analysis command-line tool released under the MIT license. The gitLeaks tool is used to detect hard-coded secrets like passwords, API keys, and tokens in local and GitHub repositories (private and public).

gitLeaks utilizes regular expressions and entropy string coding to detect secrets based on custom rules, exporting reports in either the JSON, SARIF, or CSV formats. gitLeaks can scan commit history and hook into your CI/CD pipeline.

Pros:

gitLeaks is an open-source project that is free to use and actively developed with more than 50 contributors. gitLeaks includes integration, audit, and cloning features that are not available in most open-source projects.

Cons:

With no user interface and limited integration options, gitLeaks is mostly suitable for security professionals, researchers, or niche development projects.

2. SpectralOps

Spectral offers one of the most comprehensive secret scanning solutions, integrating into every facet of the build process. Whether it’s a static build, pre-commit to Git, or CI integration, Spectral offers simple integration options that can be enhanced using plugins and hooks.

Another interesting feature is Spectral’s ability to scan Git repositories not just for configuration issues and secrets lurking in the code, but also for logs, binaries, and other data in the codebase which you may not intuitively think of as a potential leak source.

Pros:

Spectral uses an intuitive user interface that makes it much more accessible and suitable for corporate management. The AI and Machine Learning algorithms used by Spectral’s secret scanning technology ensure that detection rates increase and false positives rates decrease continuously over time as more data is processed by the system.

Cons:

Spectral is not well suited to small projects or single developers. It is designed for a development team collaborating on a large codebase.

3. Git-Secrets

Git-Secrets is an open-source command-line tool used to scan developer commits and “–no-ff” merges to prevent secrets from accidentally entering Git repositories. If a commit or merge matches a regular expression pattern, the commit is rejected.

Pros:

Git-Secrets can integrate into the CI/CD pipeline to monitor commits in real-time. One of git-secrets unique security-centric features includes support for a “Secret Providers” feature that can prevent secrets from ever showing up in a commit.

Cons:

Git-secrets uses fairly simple detection algorithms, mainly focusing on ‘regular expression’ which can often result in many false-positives. The project is no longer maintained on a regular basis and may not be suitable for use in a professional development environment.

4. Whispers

Whispers is an open-source static code analysis tool designed to search for hardcoded credentials and dangerous functions.

It can run as a command-line tool or integrated into your CI/CD pipeline. The tool is designed to parse structured text such as YAML, JSON, XML, npmrc, .pypirc, .htpasswd, .properties, pip.conf, conf / ini, Dockerfile, Shell scripts, and Python3 (as AST) as well as declarations and assignment formats for Javascript, Java, GO, and PHP.

Pros:

Right out of the box, Whispers supports a wide range of secret detection formats, covering Passwords, AWS keys, API Tokens, Sensitive files, Dangerous functions, and more. Beyond its native capabilities, Whispers includes a plug-in system that can be used to further extend its scanning capabilities to new file formats.

Cons:

Whispers is designed to accompany other secret scanning solutions, it does not perform deep scans on actual code, mostly focusing on structured text files. Scanning rules are based on a limited combination of regular expressions, Base64 and Ascii detection.

5. GitHub Secret scanning

When using GitHub as your public repository, GitHub makes available its own integrated secret scanning solution, capable of detecting popular API Key and Token structures. To scan private repositories, you are required to obtain an Advanced Security license. You can extend the detection algorithm by supplying regular expression formulas to detect custom secret string structures.

Pros:

Using GitHub’s user interface makes it a lot easier to visualize the scanning, configuration, and integration process. Extensive API Key and Token string structure support for many of the web’s popular services are included with the service, offering a strong starting base to any security evaluation.

Cons:

Secret scanning for private repositories is currently in beta. The service as a whole has a very narrow focus, mostly targeting known string structures such as API Keys and Tokens while ignoring other secrets such as database passwords, email addresses, administrative URLs, etc.

6. Gittyleaks

Gittyleaks is a straightforward Git secrets scanner command line tool capable of scanning and cloning repositories. It attempts to discover usernames, passwords, and emails that should not be included in code or configuration files.

Pros:

Gittyleaks is a simple tool that can be used to quickly scan repositories for obvious secrets. Its simplicity helps introduce the concept of secret scanning without the more complex configuration required by other solutions.

Cons:

Due to its simplicity and fixed rules, Gittyleaks is mostly useful as an introductory tool to help educate users about secrets in code. Gittyleaks is lacking the features and flexibility required by commercial development teams.

7. Scan

Scan is a comprehensive open-source security audit tool. It provides strong integration with popular repositories and pipelines such as Azure, BitBucket, GitHub, GitLab, Jenkins, TeamCity, and many more.

Scan also supports a broad section of popular frameworks and languages, integrates into the CI/CD pipeline to provide real-time commit protection, and provides extensive reporting capabilities.

Pros:

Due to its well-maintained open-source nature, Scan is possibly one of the most powerful and flexible DevSecOps tools you can get for free.

Cons:

While Scan is indeed powerful and flexible, its sparse user interface and complex setup ensure that only a limited number of security experts will be truly capable of extracting the best results from Scan’s feature set.

8. Git-all-secrets

Git-all-secrets is an open-source secret scanner aggregation project. This tool currently relies on two open-source secret scanning projects: truffleHog and repo-supervisor — two projects using regular expression and high entropy secret detection algorithms. Git-all-secrets aggregates the combined results of both scanners to present a more comprehensive picture.

Pros:

Git-all-secrets introduces an interesting concept that tries to enhance secret scanning results by not relying on a single algorithm.

Cons:

While using a novel approach, Git-all-secrets underlying scanning is still relying on basic algorithms and the project is no longer actively maintained. This tool currently provides more of a proof-of-concept that may be exploited by other projects at a future time.

9. Detect-secrets

Detect-secrets is an actively maintained open-source project designed with the enterprise client in mind.

It was created to prevent new secrets from entering the code base, detect if preventions are explicitly bypassed, and provide a checklist of secrets to maintain in a secure storage. Detect-secrets works by running periodic comparisons against heuristically crafted regular expression statements to identify new secrets that may have been committed.

Pros:

Detect-secrets’ scanning method avoids the overhead of scanning through entire git histories, as well as the need to scan the entire repository every single time. The plugin support is excellent, with 18 different plugins currently available, spanning AWS keys, Entropy Strings, Base64 encoding, Azure Keys, and many more.

Cons:

The pre-commit hook implements only basic heuristics to try and prevent obvious secrets from being committed. If secrets are split across multiple lines or do not include enough entropy, they may not be detected in real-time.

Summary

It is blatantly obvious that actively scanning Git repositories and developer commits to prevent secrets from leaking should become a mandatory part of every company’s software development pipeline.

The examples of poorly managed code security in this article are just the tip of the iceberg. Every day, personally identifying information and private intellectual property are leaked by malicious actors. These often result from lacking code security practices or simply due to human error.

You can mitigate many of these issues by using secret scanning technology integrated right into the CI/CD pipeline, and active secret scanning of Git repositories associated with these projects.

Sources:
Top 9 Git Secret Scanning Tools for DevSecOps — Spectral (spectralops.io)
9 Leaky GitHub Repositories Expose Sensitive Data of 200K U.S. Patients (eccouncil.org)
How I made $10K in bug bounties from GitHub secret leaks (tillsongalloway.com)
Plugging Git Leaks: Preventing and Fixing Information Exposure in Repositories — Honeybadger Developer Blog

--

--

No responses yet