Be sure to have read my introduction to the IPA before you dive down this rabbit hole.
Remember that terrifying Swedish word, "sjuksköterska"?
Here are the IPA transcriptions for it:
You'll notice I used two different types of brackets for the above IPA transcriptions, diagonal slashes and square brackets. In this post, I'll explain the distinction that determines which you should use (and why it matters!)
A phoneme is an abstract sound unit that is perceived to be one sound by native speakers of a language. An example would be the English /ɑ/, /ʌ/, and /æ/ phonemes. The contrasts between phonemes are the basis for being able to distinguish different words. For example, the word "cut", /kʌt/, is only one phoneme away from "cought", /kɑt/ or the word "cat", /kæt/, but these distinctions are completely obvious to native speakers. Someone learning English might not be able to hear the difference between these three words, and might pronounce them all as /kɑt/, simply because all three vowel sounds sound the same to them. They aren't the same, though! This may seem like an acceptable trade-off to take, since learning new sounds can be challenging, but this is often what makes accents hard to understand, because extra processing is necessary to decide which variant the person actually tried to say.
When learning a foreign language, learning to distinguish between different phonemes in other peoples' speech and learning to pronounce each phoneme in a distinguishable manner is the most important thing to do, because you then remove the primary hindrance in communication.
A phoneme can be realized (pronounced) in many different ways that native speakers are often unaware of. For example, the English /t/ phoneme can be realized in at least 4 different ways in American English: [tʰ], [t], [ɾ], [ʔ]. This article does a good job of explaining when each of these pops up. It is important to understand, however, that these distinctions aren't used to distinguish different words. I could pronounce the word "button" as [bʌtən] or as [bʌʔn̩], and both would be perceived as the set of phonemes /bʌtən/. Click on the first two recordings at this Forvo page to compare. Notice how you interpret both as "button", but they clearly sound different?
Each of these realizations is called a phone, a concrete sound that is the real-world counterpart of the behind-the scenes phoneme.
The 4 phones above are called allophones of the the /t/ phoneme, because /t/ can be realized as any of the above 4 sounds depending on phonetic context, and still be perceived as the /t/ sound. Pronounce the words "tick", "stick", "meeting", and "button". If you speak American English, each of these words will have a different pronunciation of the /t/ phoneme, [tʰ], [t], [ɾ], and [ʔ] respectively. However, these are all taken by native speakers to be the same sound; most people don't even realize there is a difference!
As a language learner, distinguishing between different allophones (different pronounciations) of the same phoneme is often what makes a speaker sound very competent or even native-like. Once you've mastered the different phonemes, you can start working on pronouncing the different allophones the same way a native speaker would.
For example, if instead of pronouncing "button" as [batɛn], which would be understandable but with a noticeable foreign accent, you pronounced it as [bʌʔn̩], you would sound much more like a native speaker, and it would also be easier for native speakers to understand what you said. Because you said it the same way they would (Americans, anyways), you've reduced the amount of processing needed to turn your speech sounds into a meaningful word.
Now, how is this useful to you as a language learner? Well, imagine you were learning English as a foreign language. You wouldn't naturally acquire the different rules for when to use which /t/ allophone the way native English speakers do as children unless you paid very careful attention to native speakers and tried to absorb the pattern by yourself. This would be a lot of unnecessary work, because you could simply look up the IPA transcriptions of words you were wondering about! If you studied the language's phonology pages on Wikipedia, you wouldn't even have to look up the transcriptions because you'd know the underlying rule! This is all possible thanks to the IPA.
Returning to my Swedish example: upon encountering the word "sjuksköterska" (which, by the way, means "nurse"), you could look up its IPA transcription and no longer have to wonder how it's pronounced.
The variant enclosed in diagonal slashes, /ɧʉːkɧøːtɛʂka/ is the phonemic, or broad transcription, which would be valid for any variety of standard spoken Swedish. If you already knew how Swedish phonemes are realized (i.e., the rules that turn phonemes into phones, or speech sounds), the broad transcription would be enough for you to reconstruct the correct pronunciation of the word.
On the other hand, if you were still learning how Swedish phonemes are actually pronounced, the phonetic, or narrow transcription would give you very detailed instructions on how people actually say this word. However, it's important to realize that there can be many different ways to pronounce a certain word, depending on a number of different factors such as dialect, mood, or general speaker preference for one variant over another. The transcription I gave, [ˈxʷʉ̟ʷːkˌxʷœːtɛʂkʲa], would be how a speaker of Central Standard Swedish would pronounce it. Speakers from a different part of the country would adhere to the one phonemic transcription, because the word would consist of the same sounds in their head, but the way they actually pronounce the word would be slightly different because of their regional accent.
A parallel example is the English "button", whose broad/phonemic transcription is /bʌtən/, but whose phonetic transcription in American English is [bʌʔn̩]. The IPA lets you illustrate the process that takes the abstract components of a word, i.e. phonemes, and turns them into the actual speech sounds. Imagine how useful this would be to know if you were learning English! The combination of abstract phonemes /bʌtən/ that the word "button" consists of is pronounced as the set of concrete speech sounds [bʌʔn̩]. If you learned this rule (with the IPA to help you out), you'd have a much more understandable accent.
To recap: phonemes are abstract sound units that native speakers think of as a single sound, and words can be phonemically transcribed between diagonal slashes with the IPA to illustrate what sounds a word is made of.
Phones, on the other hand, are the concrete realizations (pronunciations) of a certain phoneme. There can be many different allophones of a certain phoneme, depending on dialect, register, or any number of factors, but these are all perceived to be the same sound by native speakers.
This is useful because you can understand which sounds a word is made of just as well as a native speaker simply by looking up a transcription.
Lastly, the word "button" is the orthographical representation of the word /bʌtən/. Languages existed long before people thought to write things down, so when a word is stored in your brain, it's not stored as "button", but as /bʌtən/. All English-speaking kids first learn that the set of phonemes /bʌtən/ means something, and only later do they learn that this is written as "button". Next, they take this abstract set of phonemes and apply a set of well-defined phonological rules that then turn this into the concrete speech sounds [bʌʔn̩].
Different dialects have different rules for turning phonemes into phones, and this is why /bʌtən/ is pronounced as [bʌtən] in many British English dialects, and one of the reasons foreign accents sound foreign, but understandable. If you learn the specific rules that take phonological representations of the words (that consist of phonemes) and turn them into actual speech sounds, you'll have linguistic knowledge on par with that of a native speaker, and your accent will be extremely good!
This was a basic introduction to one of the most important distinctions in the linguistic fields having to do with actual speech. I think this is something worth knowing, and I hope you agree!