Bloomberg's Botched 8(a) Analysis

how Bloomberg got the math wrong

Jul 23, 2025

This post is so long that your email server already flagged me as a threat to national security.

When the bots take over, I’m to be the very first to die. It is written.

Click the title to read it on the Substack website like a civilized human with scroll power.

TLDR: Bloomberg published something that claimed some things about the 8(a) program that were total bullshit. One number was wildly inflated, the other was lying with statistics (and still inflated) by removing context.

Parts of what they said were only true in the sense that it can be true the average height in your family is 6’5” — without mentioning the context that your brother adopted an orphan from Serbia who now plays in the NBA.

In other words, context is everything when it comes to doing honest statistics.

With multiple public datasets and websites to “check federal spending” or “see what the government is up to” in circulation — almost none of which specify their data sources, filters, procedures, etc. — it’s important to be rigorous and clear, state your methods clearly, and do your work transparently, as I’ve tried to do here.

The USASpending.gov site is the best way to do this because it is the official implementation of the 2018 DATA Act, which requires federal agencies to report standardized, detailed financial data directly to the public. Unlike other aggregators or summary dashboards, USASpending provides raw, auditable data with clear sourcing, making it the most reliable foundation for anyone who actually wants to understand where the money goes—and how.

This post is a supplement to my ongoing series, How to Not Suck at Math.

Think of it as a bonus lesson in a dark art: how people lie with statistics—and how to catch them in the act.

Recently, Bloomberg published a set of claims that fall apart under scrutiny. It’s a classic case of statistical storytelling that sounds right, feels right, and is completely wrong.

So we’re going to walk through it, line by line, and I’ll show you how the numbers were bent into shape to fit a predetermined narrative.

This isn’t just a call-out. It’s a hands-on lesson. I’m including all the code, and yes, it’s going to be loooooooong.

But if you’re into math, stats, or data science, it’ll be worth it.

And if you’re not…like, do you even geek, bro?

Let’s begin.

On Monday, July 21, Bloomberg Government published a piece raising alarm bells about the 8(a) Business Development Program, citing fraud investigations, a recent audit, and a supposed downturn in contract obligations for fiscal 2025. The article frames 8(a) as a prime site of federal waste, fraud, and abuse.

They cite a bribery conviction, a new audit, and toss around numbers like “13% of set-aside dollars” and “8% of all small business obligations from the Defense Department” to suggest that the program is bloated and out of control.

But those numbers are deeply misleading.

In fact, they’re a perfect example of how to lie with statistics.

Both figures drastically exaggerate the size and impact of the 8(a) program, making it seem like a dominant force when it’s anything but.

If you’re serious about rooting out waste in federal contracting, this is the wrong fire to chase.

The 8(a) program is a small, easy, headline-ready scapegoat—but it’s not the villain. This is a mistake-in-progress, reminiscent of Florida spending a fortune to drug-test food stamp recipients, only to learn that poor Floridians are pretty good at just saying no — better than average, in fact. Other states waste this money, too, some spending as much as $6,500 per applicant tested when that’s far more than most food stamp recipients receive in a given year.

Chasing fraud in the wrong places doesn’t save money. It wastes it.

Let’s break it down.

This post is written with all the code and all the information for you to run the analysis yourself, so I won’t show every intermediate output—just what mattered and what we did next. But if you’re not a coder and you’re not going to be running the code, don’t worry—it’ll still make sense.

Working With Real Data

This analysis is based on fresh contract data from fiscal years 2015 through 2025 to date. I pulled everything on July 22, 2025, directly from the Award Data Archive at usaspending.gov.

Each fiscal year’s dataset was downloaded in full, unmodified, from official U.S. government sources. No scraping, no proprietary databases, no guesswork.

Here’s what this analysis covers:

How 8(a) contracts were identified
How sole-source vs. competed awards were distinguished
How the dataset was filtered to isolate the Department of Defense
How obligations were aggregated and compared across years and categories
How Bloomberg’s reported figures were tested

Every step is documented and reproducible.

All logic is explicit and based only on fields available in the public dataset.

No inferred or private data were used.

The Two Big Bloomberg Claims

Bloomberg’s newsletter made two central claims:

That 8(a) sole-source contractors received 8 percent of all small business obligations at the Department of Defense
That about 13 percent of small business set-aside dollars went to 8(a) sole-source awards from FY2015 through FY2024

This is meant to sound scary because “sole source” is supposed to imply “bad”. But in federal contracting, sole source doesn’t mean “shady” or “corrupt.” It means streamlined and targeted.

It means targeted—a mechanism Congress created for specific situations where speed, expertise, or small business goals outweigh the need for full competition. In the case of the 8(a) program, that includes helping small businesses gain a foothold in a market where incumbents dominate. Many of those businesses are run by veterans, who actually account for a large share of 8(a) participation.

Sole-source contracts exist for a reason. Not every job is big enough to warrant a full bidding process. Sometimes the agency needs something fast. Sometimes only a handful of qualified vendors exist, and pretending otherwise just burns time and money. When done right—and within the program’s strict dollar limits and oversight rules—sole sourcing is a tool for efficiency, not a loophole.

What Bloomberg leaves out is that these contracts are capped, regulated, and reviewed. They’re not giveaways. They’re one of the few ways new entrants can break into a space otherwise locked up by the usual suspects.

Each of the Bloomberg claims may be technically true — within narrow, selectively chosen definitions — (we’ll get to that). But taken together—and stripped of context—they paint a misleading picture of the 8(a) program’s size and influence.

The “8 percent” figure cherry-picks a single agency and looks only at small business obligations. The “13 percent” figure zooms out to government-wide set-aside dollars—but isolates just one contracting mechanism. Neither number tells you how big 8(a) sole-source awards actually are in the context of total federal contracting. And together, they suggest that 8(a) is steering the small business ship, when it’s barely clinging to the rudder.

This analysis separates fact from framing.

By breaking out awards by program status, competition type, and recipient characteristics, we can see what’s genuinely 8(a)—and what just got caught in the same net.

You Can Duplicate This Analysis

If someone wants you to believe a data-driven claim, they should show their work.

Fortunately, being anxiety-disordered means I assume every reader is a hostile PhD with a grudge and a red pen.

I document like my dignity depends on it—because obviously, it does.

This post includes everything you need to follow along using public data and open-source tools. If you have a decently fast machine and basic Python skills, you can replicate everything by just copy-pasting the code blocks.

It’s not quite “show your work or die,” but it’s close enough for government contracting.

Step 1: Download the Data

Go to the USAspending Data Award Archive.

Download the full contract award files (not sub-awards or assistance data) for each fiscal year from:

FY2015 through FY2025 (to date)

Each year comes as a set of CSVs representing every contract transaction the federal government processed that year. It’s a lot of data—millions of rows per year.

Step 2: (Optional) Convert to Pickle Format

To speed things up, I recommend converting each year’s data to .pkl format—a binary format that Python can load much faster than raw CSVs.

This step isn’t required, but it makes working with the data a lot more manageable, especially if you're comparing across years.

Note: You’ll want at least 32 GB of RAM to process everything at once. If your system has less, you can load and analyze one year at a time without any trouble.

Step 3: Update the File Path

When you see a line like this in the code:

pickle_dir = r'C:\Users\YOUR_USERNAME\Path\To\Your\Pickle\Files'

Just update it to match your own file path. That’s the only change you’ll need to make. I’ve flagged it for you in ALL CAPS so you can’t miss it.

The rest of the analysis will run as-is, using only common Python packages like pandas, os, and matplotlib.

If you want to see whether Bloomberg’s claims hold up, you don’t have to take my word for it. Run the code and see for yourself.

Why Am I Using Pickle Files?

Federal contract data comes as massive CSV files—one set per fiscal year, broken into chunks. To speed things up and reduce memory strain, I converted each year’s data into a more efficient format: Python pickle files (.pkl).

Pickle stores the data in compact binary form, so it loads much faster than parsing CSVs—especially helpful when you’re working with millions of rows and hundreds of columns.

For this project, I downloaded the full contract award data for FY2015 through FY2025 (to date) on July 22, 2025. Each year’s CSVs were merged and saved as a single pickle file for consistency.

To keep things lean, I filtered each dataset down to just the fields needed to test Bloomberg’s claims:

Whether the contract was part of the 8(a) program
Whether it was sole-source or competed
Whether it came from the Department of Defense
Dollars obligated
Date and fiscal year
Classification (set-aside type, NAICS, PSC)
Vendor characteristics (ANC, NHO, SDVOSB, etc.)

This targeted format makes the analysis faster, cleaner, and easier to follow—every chart and claim is traceable back to a specific, public variable.

Loading the Data

Let’s start by loading the data.

This block imports the required libraries, sets your file path, selects only the relevant fields, and loads all years into a single dataframe. If you’re running this yourself, the only thing you need to change is the file path on your machine.

import os
import pandas as pd

# define the folder where the pickle files are stored 
# THIS IS WHERE YOU CHANGE TO YOUR LOCATION IF YOU ARE RUNNING THIS ANALYSIS YOURSELF
pickle_dir = r'Your file path goes here'

# define the fiscal years to include
years = range(2015, 2026)

# define the columns to keep for Bloomberg analysis
columns_to_keep = [
    'total_dollars_obligated',
    'federal_action_obligation',
    'action_date',
    'action_date_fiscal_year',
    'awarding_agency_name',
    'funding_agency_name',
    'type_of_set_aside',
    'type_of_set_aside_code',
    'c8a_program_participant',
    'sba_certified_8a_joint_venture',
    'historically_underutilized_business_zone_hubzone_firm',
    'service_disabled_veteran_owned_business',
    'veteran_owned_business',
    'women_owned_small_business',
    'economically_disadvantaged_women_owned_small_business',
    'small_disadvantaged_business',
    'extent_competed',
    'extent_competed_code',
    'other_than_full_and_open_competition',
    'solicitation_procedures',
    'naics_code',
    'naics_description',
    'product_or_service_code',
    'product_or_service_code_description',
    'award_type',
    'award_id_piid',
    'modification_number',
    'contract_award_unique_key',
    'transaction_description',
    'recipient_uei',
    'recipient_name',
    'alaskan_native_corporation_owned_firm',
    'tribally_owned_firm',
    'native_hawaiian_organization_owned_firm',
    'solicitation_date',
    'performance_based_service_acquisition'
]

# create an empty list to hold each year's dataframe
df_list = []

# loop through each year and load its trimmed pickle
for year in years:
    pickle_path = os.path.join(pickle_dir, f'FY{year}.pkl')
    df = pd.read_pickle(pickle_path)
    
    # trim to selected columns
    df = df[columns_to_keep].copy()
    
    # add fiscal year column
    df['source_fiscal_year'] = year
    
    df_list.append(df)

# concatenate all years into one dataframe
df_all = pd.concat(df_list, ignore_index=True)

# print shape of final result
print(f'combined dataframe shape: {df_all.shape}')

Verifying the Data Integrity

Before doing any math, we need to make sure the dataset is clean. Even small amounts of accidental duplication can throw off totals and trends—especially when we start grouping by year, agency, or contract type.

Why This Matters

Each row in the federal contract data represents a single transaction: a specific contract, a modification (if any), and a dollar amount. But because these files are massive and often split across multiple CSVs, it’s easy to accidentally duplicate rows while downloading or stitching them together. That’s especially true if you're working across fiscal years or processing them in chunks.

What We’re Checking For: Duplicates

There are two types of duplication worth catching:

Full duplicates: Rows that are completely identical, field for field. These are rare but worth eliminating up front.
Contract-mod duplicates: Rows that share the same contract identifier and modification number. These could represent legitimate repeat funding—or they could be accidental copies. We’ll flag and investigate them.

What We’ll Do About It

If we find duplicates, we’ll:

Count them
Check whether they’re real (like split funding lines) or accidental
Drop them only if they’re clearly junk

The goal is simple: make sure that every contract dollar we analyze reflects a real, unique obligation—not a copy-paste artifact.

Checking for Duplicates

Before we start slicing and charting, we need to verify that the dataset is clean. Even minor duplication can distort totals—especially when grouped by year, agency, or award type.

# check for exact full-row duplicates (keep first occurrence)
full_dupes = df_all.duplicated()  # same as duplicated (keep='first')
num_full_dupes = full_dupes.sum()
print(f'full duplicate rows to drop: {num_full_dupes:,}')

# check for duplicated contract-modification pairs (keep first)
contract_dupes = df_all.duplicated(subset=['contract_award_unique_key', 'modification_number'])
num_contract_dupes = contract_dupes.sum()
print(f'duplicate contract-modification pairs to inspect: {num_contract_dupes:,}')

We ran two duplication checks:

Exact duplicates (first half of above code block):
We found 1,701 rows that were completely identical—likely introduced during download or file stitching. These were safely dropped.
Duplicate contract-modification pairs (second half of above code block):
We identified 77,639 rows that shared the same contract_award_unique_key and modification_number. That sounds bad, but it’s often valid. A single contract modification can involve multiple line items, agencies, or funding streams.

# drop fully duplicated rows (identical across all columns)
before = len(df_all)
df_all = df_all.drop_duplicates()
after = len(df_all)

print(f'dropped {before - after:,} exact duplicate rows')
print(f'new shape: {df_all.shape}')

# find duplicated contract-modification pairs (keep first occurrence)
dupe_mask = df_all.duplicated(subset=['contract_award_unique_key', 'modification_number'])

# extract just those rows
df_contract_dupes = df_all[dupe_mask].copy()

# randomly sample 50 unique contract-modification pairs
sample_keys = (
    df_contract_dupes[['contract_award_unique_key', 'modification_number']]
    .drop_duplicates()
    .sample(n=50, random_state=1)
)

# extract all rows that match the sampled contract-modification keys
sample_df = df_all.merge(sample_keys, on=['contract_award_unique_key', 'modification_number'])

# show how many rows this produced
print(f'{len(sample_df):,} rows across 50 sampled contract-modification pairs')

# sort for easier inspection
sample_df.sort_values(['contract_award_unique_key', 'modification_number'], inplace=True)

# show the first 50 rows
sample_df.head(50)

Sampling to Verify

To check whether those contract-mod “dupes” were real or junk, we pulled a random sample of 50 such pairs and inspected all matching rows (see code block above).

What We Found

They’re legit. The rows in each pair differed on key details: dollar amounts, descriptions, classifications, etc.
They reflect complexity, not error. These are standard multi-line accounting entries in federal data—not copy-paste mistakes.
Scientific notation showed up. Some dollar fields displayed as 1.0e6 instead of 1,000,000. We fixed that for readability but didn’t alter the values.

Conclusion

We dropped only the exact full-row duplicates. The contract-mod pairs stay. That preserves the integrity of the data while avoiding artificial inflation or oversimplification.

Cleaning Up the Dollar Fields

Before running any math, we made sure the obligation columns were:

Converted to numeric values (see code block below)
Displayed without scientific notation
Spot-checked to confirm proper formatting and data types

All good. Now we can start breaking down the Bloomberg claims.

# step 1: ensure both columns are numeric (coerce errors to NaN)
df_all['total_dollars_obligated'] = pd.to_numeric(df_all['total_dollars_obligated'], errors='coerce')
df_all['federal_action_obligation'] = pd.to_numeric(df_all['federal_action_obligation'], errors='coerce')

# step 2: suppress scientific notation globally for display
pd.set_option('display.float_format', '{:,.2f}'.format)

# step 3: inspect 10 random sample values from each column
print('\nsample total_dollars_obligated values:')
print(df_all['total_dollars_obligated'].dropna().sample(10, random_state=1))

print('\nsample federal_action_obligation values:')
print(df_all['federal_action_obligation'].dropna().sample(10, random_state=1))

# step 4: confirm data types of both columns
print('\ndata types of obligation columns:')
print(df_all[['total_dollars_obligated', 'federal_action_obligation']].dtypes)

Not Every 8(a) Participant Contract Is an 8(a) Contract

This part is crucial.

Just because a company is in the 8(a) program doesn’t mean every contract it wins is an 8(a) award. That’s a rookie mistake—and I suspect it’s the one Bloomberg made, loudly and confidently. We’ll see, won’t we? That’s the beauty of an open society that puts its data online: we can find out!

Unfortunately, it’s also the kind of mistake that inflates numbers and feeds misleading headlines to people who are just salivating for a scandal, like a raccoon clawing at your trash can because it smells a sandwich.

Yes, we have a lot of data. But interpreting it correctly takes more than a keyword search and a hot take.

It takes rigor. It takes precision.

And like I said, it helps to be a little anxiety-disordered—because when you assume every reader is out to prove you're wrong, you document like your life depends on it.

And maybe it does.

Who’s to say? 👀

So here’s what’s actually going on.

We have a field called c8a_program_participant. If it’s marked True, that means the vendor is in the 8(a) program. Great.

But that does not mean the award itself was made through the 8(a) mechanism!

Here’s a very normal, very not-scandalous example:

A company in the 8(a) program bids on a full-and-open competition—no set-asides, no special rules—and wins.
c8a_program_participant = True
type_of_set_aside_code = None
extent_competed = Full and Open
This is not an 8(a) contract. It’s just a contract that happens to have been won by an 8(a) participant.

This also happens with HUBZone awards, SDVOSB set-asides, or general small business competitions.

Same company, totally different procurement pathway.

So when we start tagging transactions as “8(a)” or not, we’re going to be methodical. We’ll use the codes. We’ll cross-check with vendor identity flags. We’ll do it all carefully, because nothing in this dataset screams “8(a!)” with absolute clarity—you have to look closely and, sometimes, squint.

But that’s fine. I like squinting.

And I love catching a raccoon in the act.

Building Rigorous Logic

Before we can tag each contract with the correct set-aside type, we need to understand what we’re working with. The dataset includes two fields—one coded, one plain English—that describe how (or whether) a contract was set aside for a particular category of business.

To build reliable logic, we first look at the full list of unique values in both fields. This helps us identify which tags are valid, which ones are rare, and how many records are missing this information altogether.

This isn’t glamorous work—but if you’re going to challenge public claims about federal contracting, you need to know exactly what the data’s saying before you start arguing with it.

# show value counts for type_of_set_aside_code
print('\ntype_of_set_aside_code value counts:')
print(df_all['type_of_set_aside_code'].value_counts(dropna=False))

# show value counts for type_of_set_aside (plain English)
print('\ntype_of_set_aside value counts:')
print(df_all['type_of_set_aside'].value_counts(dropna=False))

After running this block, the results told us a lot.

More than half the records have no set-aside classification at all—either because they weren’t set aside, or because the field wasn’t filled in. That’s expected. Set-aside contracts make up only a portion of all federal contracting, and it’s normal for many awards to go through full-and-open competition.

Among the records that were classified, we found a clean and consistent set of standard codes for the major categories:

8AN = 8(a) sole source
8A = 8(a) competed
SBA = small business set-aside (total)
SBP = small business set-aside (partial)
HZC = HUBZone
SDVOSBC = service-disabled veteran
WOSB and EDWOSB = women-owned categories

These are the ones we’ll tag directly, no guesswork needed.

But then there’s the long tail: rare, inconsistently entered, or hyper-specific codes that show up in small numbers (single or double digit numbers, out of over 62 million rows in the full 11-year-dataset). These include:

ISBEE (Indian Small Business Economic Enterprise)
BI (Buy Indian)
HMT (HBCU or Minority Institution set-aside – total)
VSS (Veteran Sole Source)
SDB (Small Disadvantaged Business)—which shows up twice total

Individually, these categories are too rare to analyze in isolation—but collectively, they still matter. So we’ll roll them into broader categories like "indian_set_aside" or "other_small_disadvantaged" to avoid letting edge-case labels skew the broader picture.

This approach makes the analysis cleaner, more transparent, and much harder to pick apart. We’re not collapsing nuance—we’re managing it in a way that keeps the big picture accurate while still honoring the detail.

Now that we’ve mapped the terrain, we can build the tagging system.

Why This Tagging System Matters

This tagging system isn’t just about neat categorization—it’s about accuracy. About not overstating the scale or influence of the 8(a) program.

About making sure we’re actually looking at what we think we’re looking at.

Bloomberg’s article claimed that 8(a) sole-source contractors were 13% of set-aside dollars and 8% of small business DoD obligations. Those numbers may or may not be technically correct—we’ll see, because we’re going to calculate it ourselves — but it raises important questions: how was it calculated? What were they counting, and why?

One possibility is that they conflated awards to 8(a) participants with awards made through the 8(a) program. Those aren’t the same thing. A company in the 8(a) program can—and often does—win contracts through full-and-open competition or other set-aside categories that have nothing to do with the 8(a) process.

But whether that’s how Bloomberg got their number or not, our goal is different: to get it right.

We’re building this tagging system to clearly distinguish:

Contracts that were awarded through the 8(a) mechanism
Contracts that simply went to vendors who happen to be 8(a)-certified
Everything else

Because if we exaggerate the size of the program, we risk focusing oversight in the wrong place. It becomes the infamous food stamp drug testing fiasco all over again—a fortune spent to uncover a problem that wasn’t there, while bigger issues burned quietly in the background.

We don’t want better headlines. We want better focus. And that starts with rigorous tagging.

How We’re Tagging Each Contract

To avoid the kind of confusion that can inflate the 8(a) program’s footprint—or misdirect fraud investigations—we’re building a tagging system based on clear, testable rules.

Each contract gets one human-readable tag based on a hierarchy of evidence. Here’s how it works:

Step 1: Use the Official Set-Aside Code

If the type_of_set_aside_code field is populated, that’s our primary classification.

Examples:

8AN → "8a_sole_source"
8A → "8a_competed"
SDVOSBC → "sdvosb"
HZC, HZS → "hubzone"
SBA, SBP → "small_business"
NONE → "not_set_aside"

These are confident, standardized tags.

Step 2: Use the Description Field (if Code is Missing)

If the code is blank but the plain-English field type_of_set_aside is filled in, we’ll classify based on keyword mapping.

For example:

"8(A) SOLE SOURCE" → "8a_sole_source"
"WOMEN OWNED SMALL BUSINESS" → "wosb"

We’ll flag these as fallback cases in case further cleanup is needed later.

Step 3: Cross-Check with `extent_competed`

This is where things get sharper.

For any award that looks like an 8(a) sole source (e.g., type_of_set_aside_code = 8AN or similar), we’ll confirm that:

extent_competed is not "Full and Open"
Or, ideally, that it is explicitly labeled as "Not Competed" or similar

If there’s a mismatch—say, a contract marked as 8(a) but listed as “Full and Open”—we’ll flag it for manual review or tag it as "ambiguous_8a" to avoid overcounting.

This helps ensure we’re not labeling as “8(a) sole source” something that was actually competed or miscoded.

Step 4: Group Rare Codes Into Broader Buckets

Uncommon codes like:

ISBEE (Indian Set-Aside)
BI (Buy Indian)
HMP / HMT (HBCU / MI Set-Aside)

…will be grouped into umbrella tags like "indian_set_aside" or "hbcumi".

Step 5: Default to `"not_set_aside"`

If no code or description is present, and there's no clear evidence of set-aside status, the contract gets tagged as "not_set_aside".

That includes full-and-open competitions, undefinitized contracts, and everything else outside a formal set-aside program.

By including extent_competed, we’re building a tagging system that doesn’t just rely on labels—it checks the behavior of the contract award itself. That makes the analysis stronger, more defensible, and way harder to manipulate.

This is the system we’ll use to test Bloomberg’s claims, avoid inflated numbers, and make sure we’re not sending fraud investigators to chase ghosts.

# step 1: define primary set-aside mapping from type_of_set_aside_code
set_aside_map = {
    '8AN': '8a_sole_source',
    '8A': '8a_competed',
    'SBA': 'small_business',
    'SBP': 'small_business',
    'HZC': 'hubzone',
    'HZS': 'hubzone',
    'SDVOSBC': 'sdvosb',
    'SDVOSBS': 'sdvosb',
    'WOSB': 'wosb',
    'WOSBSS': 'wosb',
    'EDWOSB': 'edwosb',
    'EDWOSBSS': 'edwosb',
    'ISBEE': 'indian_set_aside',
    'IEE': 'indian_set_aside',
    'BI': 'indian_set_aside',
    'VSA': 'veteran',
    'VSS': 'veteran',
    'VSB': 'veteran',
    'HMP': 'hbcumi',
    'HMT': 'hbcumi',
    'RSB': 'small_business',
    'ESB': 'emerging_small_business',
    'SDB': 'other_disadvantaged',
    'HS3': 'other_disadvantaged'
}

# step 2: basic assignment from code mapping
df_all['set_aside_tag'] = df_all['type_of_set_aside_code'].map(set_aside_map)

# step 3: fallback tagging using description field if code is missing
fallback_map = {
    '8(A) SOLE SOURCE': '8a_sole_source',
    '8A COMPETED': '8a_competed',
    'SMALL BUSINESS SET ASIDE - TOTAL': 'small_business',
    'SMALL BUSINESS SET ASIDE - PARTIAL': 'small_business',
    'HUBZONE SET-ASIDE': 'hubzone',
    'HUBZONE SOLE SOURCE': 'hubzone',
    'SERVICE DISABLED VETERAN OWNED SMALL BUSINESS SET-ASIDE': 'sdvosb',
    'SDVOSB SOLE SOURCE': 'sdvosb',
    'WOMEN OWNED SMALL BUSINESS': 'wosb',
    'WOMEN OWNED SMALL BUSINESS SOLE SOURCE': 'wosb',
    'ECONOMICALLY DISADVANTAGED WOMEN OWNED SMALL BUSINESS': 'edwosb',
    'ECONOMICALLY DISADVANTAGED WOMEN OWNED SMALL BUSINESS SOLE SOURCE': 'edwosb',
    'INDIAN SMALL BUSINESS ECONOMIC ENTERPRISE': 'indian_set_aside',
    'INDIAN ECONOMIC ENTERPRISE': 'indian_set_aside',
    'BUY INDIAN': 'indian_set_aside',
    'VETERAN SET ASIDE': 'veteran',
    'VETERAN SOLE SOURCE': 'veteran',
    'HBCU OR MI SET-ASIDE -- PARTIAL': 'hbcumi',
    'HBCU OR MI SET-ASIDE -- TOTAL': 'hbcumi',
    'RESERVED FOR SMALL BUSINESS': 'small_business',
    'EMERGING SMALL BUSINESS SET ASIDE': 'emerging_small_business',
    '8(A) WITH HUB ZONE PREFERENCE': '8a_hubzone'
}

# apply fallback where no code-based tag exists
df_all.loc[df_all['set_aside_tag'].isna(), 'set_aside_tag'] = (
    df_all.loc[df_all['set_aside_tag'].isna(), 'type_of_set_aside']
    .map(fallback_map)
)

# step 4: extent_competed logic for sanity-checking 8a_sole_source
# reclassify questionable 8a_sole_source awards as ambiguous_8a if they were fully competed
competed_flag = df_all['extent_competed'].str.lower().fillna('')
fully_competed = competed_flag.str.contains('full and open')

df_all.loc[
    (df_all['set_aside_tag'] == '8a_sole_source') & fully_competed,
    'set_aside_tag'
] = 'ambiguous_8a'

# step 5: assign not_set_aside to anything still untagged
df_all['set_aside_tag'] = df_all['set_aside_tag'].fillna('not_set_aside')

Sanity Check: Did Our Tagging Work?

Before we move on, let’s make sure the tagging logic actually ran the way we think it did. This block does three things:

Confirms that every single row in the dataset got a set_aside_tag
Shows how many rows fall into each tag category
Flags a particularly interesting edge case — rows that were labeled “8(a) Sole Source” but marked as fully competed

That last group? That’s where the Bloomberg analysis likely fell apart — and where being neurotic AF turns into a data science superpower.

# check total row count vs tagged rows
print(f'total rows: {len(df_all):,}')
print(f'non-null set_aside_tag rows: {df_all["set_aside_tag"].notna().sum():,}')

# show value counts of the new tags
print('\nset_aside_tag value counts:')
print(df_all['set_aside_tag'].value_counts(dropna=False).sort_values(ascending=False))

# cross-check: how many 8a_sole_source were marked as ambiguous_8a
ambiguous_count = (df_all['set_aside_tag'] == 'ambiguous_8a').sum()
print(f'\n8a_sole_source reclassified as ambiguous_8a due to extent_competed: {ambiguous_count:,}')

# peek at a few ambiguous_8a rows for sanity check
print('\nsample ambiguous_8a rows:')
print(df_all[df_all['set_aside_tag'] == 'ambiguous_8a'][[
    'type_of_set_aside_code',
    'type_of_set_aside',
    'extent_competed',
    'c8a_program_participant',
    'recipient_name'
]].sample(5, random_state=42))

What We Found — and Why It Matters

Our tagging logic worked as intended: every row got a set_aside_tag, and most of them were clear-cut. But a few stood out — specifically, 16,722 records that were labeled as “8(a) Sole Source” but also flagged as “competed” in the extent_competed field.

That’s not supposed to happen.

Sole source means no competition. So either:

The set-aside label is wrong,
The extent_competed label is wrong,
Or something weird is happening deep in the federal reporting system.

We're tagging these as ambiguous_8a and setting them aside. They’ll not be included in our main 8(a) calculations, because we can’t defend treating them as either sole source or competed 8(a) awards.

But we will analyze them separately, in case Bloomberg’s claims were inflated by including these fuzzy-edge cases.

In short: we’re not tossing them — but we’re not letting them skew our numbers either.

Even though 16,722 out of 62 million records is a rounding error statistically, we’re not tossing them or pretending they don’t exist — we’re just treating them with the healthy paranoia they deserve.

Now that every contract has been rigorously tagged, we can begin charting trends over time and by agency—including a separate look at the ambiguous_8a group to see how big a footprint these edge cases might have left.

# exclude 'not_set_aside' and limit to FY2015–FY2024 to match Bloomberg
filtered = df_all[
    (df_all['set_aside_tag'] != 'not_set_aside') &
    (df_all['source_fiscal_year'] <= 2024)
]

# group by fiscal year and set-aside tag
summary = (
    filtered.groupby(['source_fiscal_year', 'set_aside_tag'])['total_dollars_obligated']
    .sum()
    .unstack(fill_value=0)
)

# convert to billions and round to one decimal place
summary_billion = (summary / 1e9).round(1)

# print the table
print(summary_billion)

With everything tagged and summed, we can finally step back and look at the big picture.

Over the past decade, the vast majority of dollars tagged with a set-aside went to small business awards overall. That’s not surprising. But within the smaller categories, we see that service-disabled veteran-owned businesses consistently received more than either 8(a) sole source or 8(a) competed—by a wide margin in most years.

Across all years, 8(a) sole source awards made up just 5.9% of total dollars tagged with a set-aside—well below Bloomberg’s 13% claim.

This data vis makes that clear. Nothing here supports the idea that 8(a) sole source is swallowing the budget.

How the Hell Did Bloomberg Get 13%?

We’ve now shown—using rigorous, publicly replicable data—that 8(a) sole-source awards made up just 5.9% of all federal set-aside contract obligations from FY2015 through FY2024.

So where the hell did Bloomberg get 13%?

Let’s speculate. Not because it matters what’s in their heads, but because understanding how statistical mistakes happen helps us build better bullshit detectors. Here are the most likely culprits:

1. They counted all awards to 8(a) participants—regardless of mechanism.

This is the classic rookie error. If you filter for vendors flagged as c8a_program_participant == True, and sum everything they were awarded—including full-and-open competitions, HUBZone awards, SDVOSB set-asides, and small business set-asides—you’ll get a much bigger number.

But it’s wrong. Being in the 8(a) program doesn’t mean every contract you win is an 8(a) award. It just means you’re eligible for them. We built our tagging system specifically to avoid this mistake.

2. They ignored the distinction between sole source and competed.

Bloomberg’s 13% claim specifically referenced sole-source 8(a) awards. But if they lumped in all 8(a) set-asides—including competed ones—they could plausibly inflate the number that high.

Again: the difference matters. Sole-source contracts follow different procedures and raise different policy concerns. If you’re going to yell about fraud risk, you don’t get to hand-wave the competition away.

3. They double-counted modifications.

If you don’t de-duplicate contract-modification pairs, you can easily count the same obligation twice—or three times. We ran those checks. We removed exact duplicates and reviewed mod-pairs carefully to avoid that trap. If Bloomberg didn’t, their numbers could be artificially inflated by structural noise.

4. They pulled from a narrower denominator.

Want to make a percentage look bigger? Shrink the bottom. If they calculated 8(a) dollars as a percent of just small business set-asides that could have gone to 8(a) firms (excluding other categories like SDVOSB, HUBZone, WOSB), the share would rise. That’s still misleading—it implies exclusivity where there is none—but it’s an easy way to nudge the ratio.

5. They trusted the labels without checking the logic.

A contract labeled as “8(a) Sole Source” that also says “Full and Open Competition” in the extent_competed field should be a red flag—not an accepted data point. We found 16,722 of those. Bloomberg may have included them uncritically, counting both the label and the contradiction.

We didn’t. We flagged them as ambiguous and set them aside.

We’re not saying Bloomberg cooked the books on purpose. They probably didn’t. They just…might not have a paranoid, neurotic gremlin on staff who triple-checks every logic fork like their self-worth depends on it.

We do. Hi! 🤓

The 13% figure isn’t necessarily malicious—it’s just what happens when you move too fast through federal data without understanding how it works, or without stopping to ask, “Wait, does this actually mean what I think it means?”

It takes rigor. It takes paranoia. It takes the kind of person who finds 16,722 edge-case records and thinks, “Ooh, fun.”

That’s how you catch it.

Bloomberg’s Other Big Claim

Let’s talk about that other headline number from Bloomberg: the claim that 8(a) contractors received 8% of all small business obligations at the Department of Defense.

At first glance, it sounds more reasonable than the 13% figure—and maybe it is. But even this claim falls apart when you actually run the numbers.

We applied the same rigorous, replicable tagging system to Department of Defense contracts from FY2015 through FY2025, isolating only those awarded through the 8(a) sole source mechanism. Not all 8(a) vendors. Not just anyone who qualifies. Actual 8(a) sole source awards.

Here’s what we found:

DoD small business set-aside obligations: $4.73 trillion
DoD 8(a) sole source obligations: $338.15 billion
8(a) sole source share: 7.15%

It’s not wildly off—but it’s still not 8%. And it didn’t take much to get the correct number. It just took actually checking.

Here’s that code:

# filter to DoD set-aside awards, FY15–FY25
dod_sb = df_all[
    (df_all['awarding_agency_name'].str.contains('Department of Defense', na=False)) &
    (df_all['source_fiscal_year'].between(2015, 2025)) &
    (df_all['set_aside_tag'] != 'not_set_aside') &
    (df_all['total_dollars_obligated'].notna()) &
    (df_all['total_dollars_obligated'] != 0)
]

# total obligations (in billions)
dod_sb_total = dod_sb['total_dollars_obligated'].sum() / 1e9

# subset for 8(a) sole source
dod_8a_sole = dod_sb[dod_sb['set_aside_tag'] == '8a_sole_source']
dod_8a_sole_total = dod_8a_sole['total_dollars_obligated'].sum() / 1e9

# calculate percentage
dod_8a_sole_percent = (dod_8a_sole_total / dod_sb_total) * 100

# output
print(f"DoD small business set-aside obligations (FY15–FY25): {dod_sb_total:.2f}B")
print(f"DoD 8(a) sole source obligations (FY15–FY25): {dod_8a_sole_total:.2f}B")
print(f"8(a) sole source share of DoD small business set-asides: {dod_8a_sole_percent:.2f}%")

What They Claimed vs. What We Found

Bloomberg made two big claims about the 8(a) program:

That 8(a) sole source awards made up 13% of all small business set-aside dollars government-wide (FY2015–FY2024)
That 8(a) contractors received 8% of all small business obligations at the Department of Defense

At first glance, those numbers sound big—maybe even suspiciously big.

But here’s what that actually looks like in practice:

Imagine a school where the chess team gets 50% of the travel budget. That seems outrageous— is the captain the nephew of the superintendent or something?

Until you learn that only two groups use that budget: the chess team and the debate team. All the sports teams? They’re on a completely separate budget.

Suddenly, that "50%" looks a lot less sinister. It’s not the whole school’s money—it’s just half of a tiny bucket.

That’s exactly what’s happening here.

The 8(a) program didn’t get 13% of all dollars. It got 5.9% of a subset of a subset—just the contracts that were specifically set aside for small businesses. Same at DoD, where the real number was 7.15%, not 8%—and again, that’s 7.15% of set-asides only, not the entire DoD contract budget.

In other words: Bloomberg's 8% and 13% figures are what happens when you cherry-pick the denominator to create a splashy headline.

It’s lying with statistics.

Why? Probably not on purpose.

But Bloomberg’s numbers were built on vague definitions, unclear sourcing, and assumptions that didn’t hold up under inspection. They may have counted all awards to 8(a) participants, regardless of how those contracts were awarded. They may have ignored whether the awards were actually set-aside, or whether they were full-and-open competitions that just happened to go to an 8(a) firm.

We didn’t do that.

We built a tagging system that:

Distinguished program participation from contract mechanism
Respected the difference between sole source and competed awards
Flagged ambiguous edge cases instead of quietly counting them
Focused only on contracts that were formally classified as set-aside awards

No black box. No assumptions. No fudging.

Just a clear, documented pipeline that anyone can understand, copy, paste, and run themselves.

A Statistically Rigorous Look at the Actual 8(a) Impact

Now that the claims have been tested and the numbers cleaned, we can ask a better question:

What role does 8(a) really play in the federal contracting landscape?

Let’s take a statistically rigorous look at how big the 8(a) program actually is—across time, across agencies, and in context with everything else. Not just whether it exists. Not just whether it’s “too high” or “too low.” But how it fits into the real shape of government spending.

Let’s go.

Let’s start by zooming all the way out: What percentage of total federal contract dollars each year went to 8(a) sole source awards? Here’s the code to find out:

import matplotlib.pyplot as plt

# calculate percentage of total DoD contract dollars each year that went to 8(a) sole source
dod_yearly = df_all[
    (df_all['awarding_agency_name'].str.contains('Department of Defense', na=False)) &
    (df_all['source_fiscal_year'].between(2015, 2025))
]

# sum total dollars by year
total_by_year = (
    dod_yearly.groupby('source_fiscal_year')['total_dollars_obligated'].sum()
)

# filter to 8a sole source
dod_8a_sole = dod_yearly[dod_yearly['set_aside_tag'] == '8a_sole_source']

# sum 8a sole source dollars by year
sole_by_year = (
    dod_8a_sole.groupby('source_fiscal_year')['total_dollars_obligated'].sum()
)

# calculate percentage
pct_by_year = (sole_by_year / total_by_year).fillna(0)

# plot
plt.figure(figsize=(10, 6))
plt.plot(pct_by_year.index, pct_by_year.values * 100, marker='o', linewidth=2, label='8(a) Sole Source % of Total DoD Obligations')
plt.axhline(1, color='red', linestyle='--', label='1% Reference Line')  # 1 percent
plt.ylim(0, 1.2)  # y-axis limit for clear view of 1%
plt.xticks(ticks=range(2015, 2026))  # show all fiscal years on x-axis
plt.title("8(a) Sole Source as % of Total DoD Contract Dollars (FY2015–FY2025)")
plt.xlabel("Fiscal Year")
plt.ylabel("Percent of Total Obligations")
plt.grid(True, linestyle='--', alpha=0.5)
plt.legend()
plt.tight_layout()
plt.show()

and the visual:

So... Not Even One Percent

This chart shows the percentage of total Department of Defense contract dollars that went to 8(a) sole source awards in each fiscal year from 2015 through 2025.

It never hits even 1%.

In fact, it doesn’t even get close. Across the entire 11-year span, 8(a) sole source awards consistently made up well under 0.2% of all DoD contracting dollars. The lowest year dips below 0.08%, and the highest year barely reaches 0.16%.

That red dashed line? That’s where 1% would be. It sits well above the entire graph—because the real footprint of 8(a) sole source contracts at the Department of Defense is tiny.

This isn’t a statistical trick or a cherry-picked subset. These are all DoD obligations, every single year, including every contracting type and dollar. If 8(a) sole sourcing were some bloated giveaway, we’d see it here. Instead, it’s a rounding error.

And that’s just looking at sole source awards.

But what about the entire 8(a) program—not just the noncompetitive ones? What happens if we include all 8(a) contracts, both competed and sole source, and compare that to total DoD contract spending?

Let’s zoom out and take a broader look at the 8(a) program’s full footprint. Here’s the code:

import matplotlib.pyplot as plt

# filter to DoD records for FY2015–FY2025
dod_all = df_all[
    (df_all['awarding_agency_name'].str.contains('Department of Defense', na=False)) &
    (df_all['source_fiscal_year'].between(2015, 2025))
]

# total DoD obligations by year
dod_total_by_year = (
    dod_all.groupby('source_fiscal_year')['total_dollars_obligated']
    .sum()
    / 1e9
)

# filter to all 8(a) awards: sole source + competed
dod_8a = dod_all[
    dod_all['set_aside_tag'].isin(['8a_sole_source', '8a_competed'])
]

# total 8(a) obligations by year
dod_8a_by_year = (
    dod_8a.groupby('source_fiscal_year')['total_dollars_obligated']
    .sum()
    / 1e9
)

# compute percentage of total DoD obligations
percent_8a = (dod_8a_by_year / dod_total_by_year) * 100

# plot
plt.figure(figsize=(10, 6))
plt.plot(percent_8a.index, percent_8a.values, marker='o', label='All 8(a) Awards % of Total DoD Obligations')
plt.axhline(1, color='red', linestyle='--', label='1% Reference Line')

plt.title("All 8(a) Awards as % of Total DoD Contract Dollars (FY2015–FY2025)")
plt.xlabel("Fiscal Year")
plt.ylabel("Percent of Total Obligations")
plt.xticks(ticks=percent_8a.index)  # show every year
plt.ylim(0, max(percent_8a.max(), 1.1))  # pad a bit above max or at least 1.1 for headroom
plt.grid(True, linestyle='--', alpha=0.5)
plt.legend()
plt.tight_layout()
plt.show()

And the visual:

All 8(a) Awards—Still Nowhere Near 1%

If you zoom out and include every 8(a) contract, not just the sole-source ones, the overall picture doesn’t change much. This chart shows the share of total Department of Defense contract obligations that went to any 8(a) award—competed or sole source—from FY2015 to FY2025.

It still never breaks 1%.

In fact, even the highest year (2015) barely clears the halfway mark at 0.52%, and by 2019 the percentage bottoms out below 0.2%. There’s a slight rebound in recent years, but it hovers around a quarter of a percent—not exactly headline-worthy.

This includes every 8(a) contract we could confidently identify through set-aside codes, descriptions, and competition data. If the 8(a) program were bloated or out of control, it would show up here. It doesn’t.

Even with the widest reasonable definition, the 8(a) footprint at DoD is still microscopic.

So What Did the Pentagon Spend Its Money On?

If 8(a) contracts made up less than one percent—often less than half a percent—of the Pentagon’s total obligations, it begs the obvious question: Where did the rest go?

Let’s break it down.

The next chart shows a high-level view of how the Department of Defense distributed its contract dollars each year from FY2015 through FY2025. We’ve grouped contracts by set-aside category where available, with everything else—including full-and-open competitions and other undefinable awards—tagged as “not set aside.”

This isn’t about fraud or favoritism. It’s about context. And once you see the full landscape, the tiny footprint of 8(a) contracts gets even harder to inflate.

Here’s the code:

import matplotlib.pyplot as plt
import matplotlib.ticker as mticker

# step 1: get clean total DoD obligations by year
dod_total_by_year = (
    dod.groupby('source_fiscal_year')['total_dollars_obligated']
    .sum()
    .div(1e9)
)

# step 2: sum obligations by set-aside and year
by_year_category = (
    dod.groupby(['source_fiscal_year', 'set_aside_tag'])['total_dollars_obligated']
    .sum()
    .unstack(fill_value=0)
    .div(1e9)
)

# step 3: convert to percentage of total each year
by_year_category_pct = by_year_category.div(dod_total_by_year, axis=0) * 100

# step 4: plot
ax = by_year_category_pct.plot(
    kind='bar',
    stacked=True,
    figsize=(14, 8),
    width=0.8
)

plt.title('DoD Contract Obligations by Set-Aside Category (% of Total, FY2015–FY2025)')
plt.xlabel('Fiscal Year')
plt.ylabel('Percent of Total Obligations')
plt.xticks(rotation=45)
ax.yaxis.set_major_formatter(mticker.PercentFormatter())
plt.legend(title='Set-Aside Category', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.grid(True, axis='y', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()

And the visual:

So What Did the Pentagon Spend Its Money On?

This chart answers the obvious follow-up: If 8(a) sole source awards made up less than one percent of the Pentagon’s contracting budget, where did the rest of the money go?

It went exactly where you’d expect: overwhelmingly to contracts that weren’t set aside for small businesses at all.

Year after year, the vast majority of DoD obligations—over 98%, in most years—fall into the “not set aside” category. That includes full-and-open competitions, large defense contractors, sole-source awards to major suppliers, and other mechanisms that don’t fall under any formal small business program.

The slivers at the top of each bar? Those are all the small business set-aside categories—combined. The 8(a) sole source share is so small you can barely see it.

This isn’t a gotcha. It’s not fraud. It’s context.

There’s a reason the 8(a) program was created: to carve out opportunity in a landscape that otherwise swallows small firms whole. When you zoom out and look at the full contracting picture, it becomes absurd to cast 8(a) as some kind of dominant force.

It’s not a takeover. It’s a rounding error.

That Was the Big Picture. Now Let’s Zoom In.

We’ve seen how tiny the 8(a) program looks against the full scale of Pentagon contracting. But what about within the small business world itself?

After all, 8(a) is just one of several set-aside programs designed to help specific types of small businesses compete. There’s SDVOSB for service-disabled veterans, HUBZone for underdeveloped areas, WOSB for women-owned firms, and more.

So how does 8(a) stack up against the others?

The next chart compares all the major small business set-aside categories at DoD, showing their relative share of total small business set-aside obligations for each fiscal year from 2015 to 2025. This tells us where the money went within the small business carveout—and whether 8(a) really dominated the way Bloomberg implied.

Here’s the code:

# step 1: filter to DoD-only, FY2015–FY2025, and exclude 'not_set_aside'
dod_sb = df_all[
    (df_all['awarding_agency_name'].str.contains('Department of Defense', na=False)) &
    (df_all['source_fiscal_year'].between(2015, 2025)) &
    (df_all['set_aside_tag'] != 'not_set_aside')
]

# step 2: group by fiscal year and set-aside tag, sum obligations
sb_by_type = (
    dod_sb.groupby(['source_fiscal_year', 'set_aside_tag'])['total_dollars_obligated']
    .sum()
    .unstack(fill_value=0)
)

# step 3: convert to percent of total small business set-aside per year
sb_by_type_percent = sb_by_type.div(sb_by_type.sum(axis=1), axis=0) * 100

# step 4: plot as stacked percentage bar chart
sb_by_type_percent.plot(
    kind='bar',
    stacked=True,
    figsize=(12, 7),
    width=0.8
)

plt.title('DoD Small Business Set-Aside Obligations by Category (% of Total, FY2015–FY2025)')
plt.xlabel('Fiscal Year')
plt.ylabel('Percent of Small Business Set-Aside Obligations')
plt.xticks(rotation=45)
plt.legend(title='Set-Aside Category', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.grid(True, axis='y', linestyle='--', alpha=0.7)
plt.show()

And the visual:

Inside the Small Business World

This is what the 8(a) program looks like within the DoD’s small business set-aside ecosystem.

Each bar represents 100% of all DoD dollars that went to small business set-asides in a given fiscal year. The categories break down which programs received those dollars—service-disabled veteran (SDVOSB), women-owned (WOSB), HUBZone, and so on.

And again, 8(a) doesn’t dominate. Not even close.

In most years, the largest chunk of small business set-aside dollars went to awards with no special subgroup designation at all—just general small business awards. Veteran-owned businesses consistently came next. Then SDVOSBs. The 8(a) categories—especially sole source—occupied modest slivers.

This matters, because Bloomberg framed 8(a) as a black hole sucking up small business funds. But the data shows a diverse field, with many different types of businesses benefiting, and 8(a) making up only one of many small slices.

Even within the carveout, it’s just one program among many—not the juggernaut they made it out to be.

We’ve seen how 8(a) awards compare to the total DoD contracting landscape. We’ve seen how they stack up against all the other small business programs combined.

Now let’s finish the story.

This last chart focuses solely on the small business set-aside world and compares individual programs—8(a) sole source, SDVOSB, HUBZone, WOSB, and others—directly against each other. No hiding behind big buckets. No collapsing categories.

If 8(a) sole source were dominating this space, this is where we’d see it.

Let’s take a look. Here’s the code:

import matplotlib.pyplot as plt

# step 1: filter to DoD-only, FY2015–FY2025, and exclude 'not_set_aside'
dod_sb = df_all[
    (df_all['awarding_agency_name'].str.contains('Department of Defense', na=False)) &
    (df_all['source_fiscal_year'].between(2015, 2025)) &
    (df_all['set_aside_tag'] != 'not_set_aside')
]

# step 2: group by set-aside tag (no year), sum total obligations
total_by_program = (
    dod_sb.groupby('set_aside_tag')['total_dollars_obligated']
    .sum()
    .sort_values(ascending=False)
) / 1e9  # convert to billions

# step 3: plot as a bar chart
plt.figure(figsize=(10, 6))
total_by_program.plot(kind='bar', color='slateblue')

plt.title('Total DoD Small Business Set-Aside Obligations by Program (FY2015–FY2025)')
plt.xlabel('Set-Aside Program')
plt.ylabel('Obligations (Billions)')
plt.xticks(rotation=45, ha='right')
plt.grid(True, axis='y', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()

And the visual:

This chart ends the story with a mic drop.

What you’re looking at is the total amount of money obligated through each small business set-aside category at the Department of Defense from FY2015 through FY2025.

The “8(a) sole source” program—the one Bloomberg tried to paint as a bloated funnel of favoritism—ranks third. Not first. Not dominant. Third.

The vast majority of small business set-aside obligations—by far—go through the generic “small_business” and “8(a)_competed” categories. Together, they account for the lion’s share of activity.
8(a) sole source trails both of those, coming in behind and followed by SDVOSB (Service-Disabled Veteran-Owned Small Business) and HUBZone awards.
Several other programs like WOSB (Women-Owned), HUBZone, and SDVOSB show up as narrow but consistent slivers—real parts of the small business ecosystem but nowhere near dominating.

In short: there’s no takeover here. No outsized footprint. No evidence of one program swallowing the rest.

And that’s striking—because if abuse, fraud, or bloat were going to show up anywhere, it would be here. Sole-source contracts are the most scrutinized precisely because they lack competition. That’s where you'd expect to find sweetheart deals, inflated prices, or favoritism—if it were happening at scale.

But 8(a) sole source doesn’t dominate even within the tiny sliver of the budget reserved for small business set-asides. It’s just one mechanism among many—and it’s smaller than most.

So if someone tells you 8(a) sole sourcing is “taking over” small business contracting at the Pentagon?

Show them this chart. Then ask them where, exactly, that takeover is supposed to be happening.

Conclusion: What the Data Actually Shows

Bloomberg’s story leaned hard on two claims:

That 8(a) sole-source awards made up 13% of all small business set-aside dollars government-wide
That 8(a) contractors got 8% of all small business obligations at the Department of Defense

Those numbers sound dramatic. They’re meant to. But once you dig into the data—carefully, transparently, and with appropriate definitions—they don’t hold up.

The truth?

That “13%” figure drops to 5.9% when you limit it to contracts that were actually set aside for small businesses and actually awarded via 8(a) sole source.
That “8%” figure drops to 7.15% under the same conditions—and again, that’s 7.15% of a small slice of the DoD budget, not of Pentagon contracting overall.

These are not small errors. They’re category mistakes dressed up as exposé.

Bloomberg cherry-picked the denominator. They used numbers that sounded like total spending shares—but were actually proportions of a filtered subset, sometimes filtered further by mechanism, sometimes not.

In doing so, they implied that 8(a) sole-source awards represent a bloated, disproportionate chunk of federal contracting. But our analysis shows something else: a modest program with no evidence of runaway dominance, even in the niche where it’s supposed to be strongest.

We didn’t use guesses or vibe-based categorizations. We built an explicit, reproducible method:

We tagged only formally designated set-aside contracts
We distinguished between sole-source and competed mechanisms
We flagged ambiguous cases instead of burying them
And we limited our analysis to the years and fields Bloomberg used—then showed our math

No headlines built on illusion. No statistics massaged into compliance. Just what’s there, and what isn’t.

Which, of course, brings us back to the real point: how to not suck at math. Or, more specifically, how to not get played by someone who does.

I hope this helped.

And if it didn’t, please know that I get up at 4:30 a.m., but I stayed up past 11 last night fixing this.

Why?

Because someone was wrong about math on the internet, and I have a moral obligation to suffer for it.

You’re welcome.

Aladdin Sane

You really have a talent for this kind of thing I guess that’s why you are a data scientist. I think mark twain said that there are “ lies damn lies and statistics “. Still true after all these years.

Expand full comment

Josh Slocum

3dEdited

I don't know whether to get angry, or laugh, or both at how typical it is that the "respectable" media only gives a shit about accuracy when it can give them an opportunity to nurse their hysterical Trump Derangement Syndrome. And then they just. . .lie about stuff. It's so pathetic at this point that I can barely believe this is the real world.

They can lie, distort, and actually libel the man and his administration, and go to bed at night sleeping soundly like babies. They are disgusting.

As a former delirious, irrational leftist, I look back in embarrassment at how I fell for this for years.

Trump is the best thing that's happened to this country in decades.

1 reply by Holly MathNerd

1 more comment...

Holly’s Substack

Discussion about this post

Holly’s Substack

Bloomberg's Botched 8(a) Analysis

how Bloomberg got the math wrong

Working With Real Data

The Two Big Bloomberg Claims

You Can Duplicate This Analysis

Why Am I Using Pickle Files?

Loading the Data

Verifying the Data Integrity

Why This Matters

What We’re Checking For: Duplicates

What We’ll Do About It

Checking for Duplicates

Sampling to Verify

What We Found

Conclusion

Cleaning Up the Dollar Fields

Not Every 8(a) Participant Contract Is an 8(a) Contract

Building Rigorous Logic

Why This Tagging System Matters

How We’re Tagging Each Contract

Step 1: Use the Official Set-Aside Code

Step 2: Use the Description Field (if Code is Missing)

Step 3: Cross-Check with extent_competed

Step 4: Group Rare Codes Into Broader Buckets

Step 5: Default to "not_set_aside"

Sanity Check: Did Our Tagging Work?

What We Found — and Why It Matters

How the Hell Did Bloomberg Get 13%?

Bloomberg’s Other Big Claim

What They Claimed vs. What We Found

A Statistically Rigorous Look at the Actual 8(a) Impact

So... Not Even One Percent

All 8(a) Awards—Still Nowhere Near 1%

So What Did the Pentagon Spend Its Money On?

Inside the Small Business World

Conclusion: What the Data Actually Shows

Discussion about this post

Step 3: Cross-Check with `extent_competed`

Step 5: Default to `"not_set_aside"`