Why is my PDF file so big and how to reduce PDF file size?
Do you have a big PDF file? We love big PDF files! It’s kind of our thing. Maybe you don’t know why it is big or perhaps you don’t have the tools to fix it. Well, we love finding out what makes files big and all the many ways they can be made small again. So today we are going to show you how to figure out exactly what is making your PDF file so heavy — and give you some tools and techniques for reducing its file size.
Starting with why…
Before we dive in, it’s worth noting that there are many, many reasons why PDF files can end up so big, and there are lots of different tools and approaches that can be used to downsize them. So it’s important to start with a clear understanding of the type of content that is currently lurking in your PDF file. Because when you know that, you can use the right tool for the job and just get on with your day.
6 common reasons PDF files get too large
LARGE IMAGES — obvious, but not always as easy to spot as you might think
PIECE INFORMATION — extra data saved inside the PDF file, by applications like Adobe Illustrator, Photoshop etc.
CONTENT STREAMS — PDF files created with the text, layout and images all combined together into impenetrable content streams
EMBEDDED FONTS — some fonts can be surprisingly large.
OVERSIZED PAGES — e.g. artwork for posters
EMBEDDED FILES/FILE ATTACHMENTS - it’s possible to attach many types of files to a PDF and these files are stored inside the PDF contributing to its total size.
Find out what is making your PDF file big
Helpfully, there is a free tool that lets you see exactly why YOUR pdf is so large >>>
The online PDF file analyzer from WeCompress will show you a breakdown of the content in your PDF file:
In the above example we can see 4 of the 6 common reasons broken out explicitly and 2 are grouped under the Other heading:
Images: 9,587 KB
Content Streams: 27 KB
Fonts: 25 KB
Page Size: 9.5” x 12.0”
Piece Information is grouped under Other: 27,392 KB (27 MB)
Embedded Files are also grouped under Other.
The analyzer also provides a calculated Size Per Page figure. If this is below 100 KB then you are not likely to be able to further reduce the file significantly. In the example above you can see it’s over 6,172 KB (6 MB) per page - so definitely some scope for reduction.
US letter is 8.5 x 11 inches - so if your PDF is a lot larger than this then that will definitely be a factor in driving up the file size. Check out this banner artwork example, where you can see the page size is very large:
How to reduce PDF file size
So now you know what content is responsible, let’s learn how to fix it.
There is a section below on dealing with each of the 5 most common reasons and the links below will jump you straight to the section you need.
How to compress large images in PDF files
You may be able to tell just by looking at the document that it’s full of large full-colour images and know they are likely to blame for the file size. In this case the best option is to use a PDF Compressor like NXPowerLite Desktop.
These tools are designed to compress PDF files quickly and simply by automatically resizing images, optimizing image formats and adjusting quality levels while removing background or hidden data that isn't needed for normal use of the file.
Extracting images from a PDF file to hand optimize them is not really a practical option, hence the suggestion to use a compressor. However, if the file was created in another application and you have access to the source file, then your best option might be to save a new PDF from the source and check for options to downsample the images.
This is the save PDF dialog in Photoshop for example, which may have a preset for smallest size, but you are looking for compression options including downsampling images as highlighted below.
What if your images look like text?
Sometimes scanned PDF documents can look like they only contain text, however, actually every page is a full-page image of the text instead. A PDF Compressor can often reduce these files by optimizing the images, however you may get better results using Optical Character Recognition (OCR) to convert the images to text.
One way to check if your text is actually images of text is to try and select some text using Adobe Reader (or your preferred PDF reader software). As you can see from the example below, if you are presented with a frame around the text rather than a text editor cursor then you can be sure that the text is an image representation of the text.
If you have a PDF with scanned content you may be able to reduce the file size by using an Optical Character Recognition (OCR) process on the content. We have used Soda PDFs online OCR service with success previously. This will convert those image representations of text back to actual text and image elements, which can significantly reduce the size. Be aware that this method won’t work on every scanned PDF and the OCR process can sometimes make the PDF file bigger!
How to remove "Piece Information" to reduce PDF size
Piece Information can include all kinds of proprietary information that an application wants to save inside a PDF file. It isn’t required to display or print the PDF, instead it will be there for other purposes.
The most common reason is caused when saving a PDF from an app like Photoshop or Illustrator, and you leave the Preserve Photoshop / Illustrator Editing Capabilities option checked. This will save a complete copy of the .AI/.PSD file inside the PDF to allow for future editing in the original application.
If your PDF file contains a large Piece Information component then you can compress it using a PDF Compressor like we suggested above for images. Applications like NXPowerLite will usually have a setting that allows you to remove Piece Information, or as it’s also called Private Application Data.
Alternatively, if you have access to the source file the PDF was created from, for example, an Adobe Photoshop or Illustrator file, then you can simply re-export the PDF and make sure to uncheck the Preserve Editing Capabilities setting to avoid the Piece Information being saved in the file.
How to compress PDF Content Streams
Most applications make use of image and text markup to create PDF content items, however, some applications create PDFs that use Content Streams. These are essentially the contents of the pages - the text and any line drawings. When content streams are used, a page in a PDF document has one or more content stream parts that together contain all the PDF page description commands for the page. The problem is that because all of the content is stored in a ‘Stream’ of data there is no real way of identifying which piece of content is driving up the file size.
Unlike images which can be resized or recompressed with a more optimal quality to reduce them in size, content streams tend to be large and cannot be directly compressed. However, there are workarounds to compress the size of PDF files made from content streams. The main one is to try and reprint the PDF using a browser PDF printer and then use a PDF compressor to reduce the size of the resulting file. It’s very easy and we show you how to do it in this support article.
How to subset or remove fonts to reduce PDF file size
If you want to ensure fonts look the same on every device that the PDF can be shown it is a great idea to embed the fonts. Even if the host machine does not have the fonts installed this will guarantee the fonts display correctly and the document’s layout remains as the editor intended.
However, embedding the fonts comes at a cost of increasing file size. If a document has multiple fonts or double-byte fonts this can mean the file size increases by many megabytes.
If you have seen that Fonts are a big factor of your PDF file size then your best option in most cases will be to subset the fonts in the PDF which will remove any unused characters from the embedded font set. This can reduce the file size but will also mean that editing of the file in the future will require the fonts to be loaded on the host machine.
You can check the list of fonts used in your PDF and also whether they are embedded or subsetted using Adobe Reader. Just click on the File menu and then Properties and switch to the Fonts tab. If there is nothing in brackets after the name it is NOT embedded - otherwise it will indicate in brackets next to each font whether it is an Embedded Subset or if the full font is Embedded.
Most PDF editors support removal or subsetting of embedded fonts. If you don’t have access to one then use a free online PDF compressor or an offline PDF compressor software which can subset the fonts for you.
How to change page dimensions to reduce PDF file size
While most page sizes for PDFs are either standard A4 or US Letter some have significantly higher page sizes. Take a design for a banner which needs to be printed. These need to be designed at the actual size needed for printing. In order for the PDF to print well, high-resolution graphics and content are likely to be used. This obviously drives up the file size significantly and you’ll have to be more careful in your compression options if you want to reduce the file size without adversely affecting the quality of print.
Most PDF viewing or editing applications don’t highlight the page size so you’ll have to delve into the properties of the file to find this information. You can also use the PDF size analyzer on WeCompress and expand the Document Properties to see the page size.
If you don’t need the PDF at full page size, you can resize (scale) the PDF pages using an online tool like Docupub. Alternatively if you have Adobe Acrobat Pro you can scale PDF pages using their preflight tool. Scaling content in PDF files is not an easy process and can sometimes result in the content looking strange - so checking is advised.
Congratulations! You’ve now dealt with your problem file and can get on with your day. 🏆
If however, you encounter this problem often and are curious about whether you can change things so that you don’t have to regularly deal with oversized PDFs then read on a little more. We’ll shed a little light on a couple of common scenarios that may sound familiar.
Why do scanned PDFs get so large?
This is most often due to either poorly configured scanners, or old scanner software. This can result in scanners capturing document pages as high resolution images without any compression. If your PDFs are regularly being created too large, then you may be best checking your scanner configuration or in some cases upgrading your scanner. Newer scanners should be able to scan optimized PDFs as either small optimized images, or using OCR to create smaller text-based PDFs.
If this isn’t an option then the PDF compressors above can help you reduce the size of the PDF scans on an ad-hoc basis. If you need it to be automated, then Neuxpower offers a Server PDF Compressor that can be configured to automatically compress PDFs that arrive into specific folders.
Why do Photoshop/Illustrator/Office PDFs get so large?
For Photoshop and Illustrator we’ve already mentioned the Preserve Editing Capabilities feature and also the need to make sure images are routinely downsampled appropriately.
The other common workflow that can result in overly large PDFs affects those created from Microsoft Office as well. To ensure the PDF displays well on every device some designers or PDF editors may opt to convert fonts to outlines. For example, if the document is to be printed and it contains a typeface that your print company does not have on their system, then the font will be substituted for another font, which will likely alter the layout and mean it will be printed incorrectly.
The process of converting text to outlines means that the text is no longer text - it has become a graphic, and the text cannot be altered. Once the text is rendered as images then it will look the same on any device no matter which fonts are or are not installed. Greater fidelity does come however, at the expense of file size.
Microsoft Office also does a similar thing by default when you save as a PDF. The option is to Bitmap text when fonts may not be embedded (shown below). When selected, any text using fonts that are not installed on the host machine or cannot be embedded at the point of exporting to PDF, will be converted to images. This ensures that you don’t end up creating a PDF that looks dramatically different to your Word document - but it can significantly increase the file size.
To avoid this problem we suggest trying to use fonts that can be embedded — Microsoft have expanded their cloud fonts offering and there are a great range of options now. Look out for the cloud symbols in your Microsoft Application font list. For more information — Julie Terberg has created the definitive guide to using cloud fonts.
How to remove attached files to shrink PDF size
If your PDF file contains embedded files, that you don’t need then you’ll need a PDF editor to remove them. If you do have Adobe Acrobat, then you can use the Audit Space Usage tool, which will show you the size of any embedded files like this:
Read here for instructions on how to edit or remove attached files in your PDF files.
Related
Working on a Mac? Check out Why converting a PPTX to PDF on Mac can create huge files
Need to compress PDF files on a Server? Try NXPowerLite Server