What is the IPA?
The IPA stands for "International Phonetic Alphabet", and the idea is to have a system of symbols that unambiguously transcribe human speech. There are no spelling rules or writing conventions to get in the way, it's just purely a sound-to-symbol system that's really helpful in learning how other languages are pronounced.
Check out this chart and click around.
The vowel chart
The vowel chart is a trapezoidal shape for a reason; it's meant to represent the human mouth and possible tongue positions. Depending on a vowel's position in a chart, it's possible to derive approximately where the tongue should be when articulating it.
Here's how the vertical dimension works: The more you open your mouth, the lower the sounds are on the chart that you end up pronouncing. If you'll notice how open your mouth is when you say the word "cought", /kɑt/, it's much more open than when you say, for example, "feet", /fit/. That's why /ɑ/ is at the bottom of the chart, to represent how open your mouth is, while /i/ is all the way at the top, since your mouth is almost closed when you say this sound.
The side-to-side dimension represents your tongue position. Imagine a mouth facing left. The farther to a left a sound is on the chart, the farther forward your tongue is in your mouth when you say that sound. Note how your tongue is at the very front of your mouth when you pronounce the /i/ in "feet", and is at the very back of your mouth when you say "food", /fud/.
This is particularly useful for learning new vowel sounds, because it then becomes possible to have an idea of where to place your tongue instead of blindly guessing until you hit a spot that sounds right.
Try moving your tongue around and making the different sounds, and note the correspondences.
The vowels come in groups of two, for the most part. The left symbol is the unrounded variant, and the right one is the rounded variant. This feature is independent of tongue position; essentially, it's saying whether or not your lips are pursed when pronouncing the sound. For example, the /y/ sound in the upper left corner is pronounced the same way as the /i/ sound (like in English "tea"), except the lips are rounded. Just imagine that your tongue is pronouncing the /i/ in "tea", but your lips are pronouncing the /u/ in "two".
Something that I'd like to note that's particularly relevant for English speakers is that vowel sounds are often combined to create what's called a diphthong. This is when two vowel sounds are pronounced in such a way that the first sound gradually glides into the other. For example, the English word "five", /fa͡ɪv/, contains a diphthong in which the /a/ vowel gradually slides into an /ɪ/ vowel (like in the word "fit"). English is full of these (check out this page for a comprehensive list), and it's a common mistake for native speakers of English to pronounce vowel sounds in other languages as diphthongs, when those sounds should actually be held constant without slipping into another sound.
In French, for example, it can be tempting to pronounce the french /e/ as an /e͡ɪ/ (as in "eight"), but that wouldn't be correct. Compare the first recording on this Forvo page for french "mes" with this one for English "May". Note the different vowel qualities.
When it comes to vowels, experimentation is the name of the game. Use the vowel chart to put your tongue in the right area, then repeat a bunch of times until what you say matches a native speaker's recording (Forvo is an excellent resource for this).
The consonant chart
Next up are the consonants. This chart can look intimidating at first, but it's organized very simply once you understand the two axes.
Manners of articulation
The vertical labels on the left are manners of articulation. To understand what this means, pronounce /b/ as in "Bob", then /v/ as in "variable". Both are consonantal sounds because they block the flow of air through your oral cavity, but they do so differently. With /b/, sound is stopped completely, then released again. This is why you can't sustain a /b/ sound; eventually, the pressure from your lungs builds up in your mouth and you're forced to release it. This type of consonant is called a "plosive", the first row in the table.
A /v/ sound, on the other hand, can be sustained. It doesn't fully block the flow of air through your mouth. Rather, it just constricts the passage that air goes through to create a recognizable sound. Sounds like /v/, which allow air to keep passing through (and which can be sustained for as long as you can breath out), are called "fricative" consonants, because the sound is generated from the friction of the air and the place of articulation (more on this in a moment).
The other manners of articulation are, I think, relatively straightforward. Trills are made by trilling your tongue (or in a few languages, your lips. Think raspberry.) Taps are when your tongue taps a part of your mouth, for example "butter" /bʌɾɚ/ where the "tt" is an alveolar tap in many varieties of English.
Nasals are when sounds are said through your nose, like in "mom", /mɑm/.
Laterals and lateral fricatives are when your tongue curls in such a way that air flows past the sides of your tongue. In lateral fricatives, there's also friction between your tongue and mouth. Those are pretty uncommon sounds, but you'll get them if you practice.
Lastly, approximants are when your tongue isn't actually touching anything; there's a larger space in your mouth for the air to flow through. These sounds are something between vowels and consonants.
Places of articulation
Now, notice where both consonants are pronounced. A /b/ is pronounced with both lips closed, so its place of articulation is said to be "bilabial". Now, if you look at the chart, you'll see the /b/ at the intersection of "bilabial" and "plosive".
A /v/ is made by the upper teeth touching the lower lip, and is said to be a "labiodental" consonant.
Dental consonants are made with your tongue touching your teeth, like the "th" in English "father" /fɑðɚ/. This sound is a voiced dental fricative.
Alveolar consonants are where your tongue touches the alveolar ridge just behind the upper teeth, like the "t" in "tea" /ti/, a voiceless dental stop.
In post-alveolar consonants, your tongue is further back than with alveolar consonants, like the "sh" in "fish", /fɪʃ/, which is a voiceless post-alveolar fricative.
Retroflex consonants are where your tongue tip is curled back. The English "r" sound is pronounced like this by most people. These consonants are also what give many languages spoken in India their characteristic sound, I find.
Palatal consonants are where your tongue touches the hard palate, essentially the middle of the roof of the mouth. The "y" sound in "yes", /jɛs/, is a voiced palatal approximant.
Velar consonants are where your tongue touches the velum, or the back of the roof of your mouth. The "k" in "cat", /kæt/ is a voiceless velar stop.
Uvular consonants are pronounced even further back in your mouth. Think of the French "r" sound - that's a voiced uvular fricative. The Scottish "ch" sound is a voiceless uvular fricative.
The last thing to talk about is voicing, which you're probably wondering about at this point. This is just referring to whether or not your vocal cords are vibrating when you pronounce a sound.
Say "easy", /izi/, out loud. Now say "icy", /a͡ɪsi/, out loud. Notice how in "easy", your vocal chords vibrate through the whole word, whereas in "icy", you almost whisper the "s" sound in the middle? That's what voicing means. /z/ is a voiced alveolar fricative, and /s/ is its unvoiced counterpart.
That's why there are two symbols in most of the cells - they're the same sounds, but the left one is unvoiced and the right is voiced.
This was a lot to take in, I know. Don't worry if if this doesn't all make sense to you yet - the important thing is that you're aware of what the IPA is and have started to understand what the different symbols mean. The IPA is incredibly useful in learning a language's pronunciation, and I really recommend you take the time to at least get familiar with it.