Talk:Byte

From Citizendium, the Citizens' Compendium
Jump to: navigation, search
This article is developing and not approved.
Main Article
Talk
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
 
To learn how to fill out this checklist, please see CZ:The Article Checklist. To update this checklist edit the metadata template.
 Definition A byte is a unit of data consisting of (usually) eight binary digits, each of which is called a bit. [d] [e]

missing on purpose?

Hi, I miss info about the small and big-endian. It should IMHO be part of the byte story. Robert Tito |  Talk  20:54, 6 April 2007 (CDT)

I did not mention it because, quite honestly, that's largely outside my scope of knowledge. If you're knowledgeable in that area, we would appreciate a contribution to the article :) --Joshua David Williams 21:03, 6 April 2007 (CDT)
Edit - I did not realize you're an editor when I wrote that. If you're busy, I could find another user to help (Eric may be able to). In answer to your question, no, it was not excluded purposely - that is, to not include it at all. --Joshua David Williams 21:06, 6 April 2007 (CDT)

what is it

Big and small-endian refer to the 'sign'-bit, in big-endian it is at the end of the byte, in small at the begin (or the other way around - I still look that up). It is used in diverse protocols to discrimninate them from others. The best known IPX/PX versus TCP/IP. It took quite some problemsolving for cisco to let these two networks communicate without problem (it gave rise to their iOS version 13 and above - created when I was on the phone with them.) Signs of bytes are of importance for the variables needed to transfer specific information. Robert Tito |  Talk  21:43, 6 April 2007 (CDT)

Should this be a separate article that deserves a mention on Byte? Remember we don't want to overwhelm the average person with too much info stuffed into the Byte article --Eric M Gearhart 04:07, 7 April 2007 (CDT)

I think it is more relevant than all the prefixes as it IS info within a Byte. Robert Tito |  Talk  09:01, 7 April 2007 (CDT)

See this jargon? That's exactly what we want to avoid. I can't make heads or tails of it. Could someone please explain this in layman's terms? --Joshua David Williams 09:47, 7 April 2007 (CDT)

big and small endians are only the way to tell the machine what the sign of the byte is: signed (+) or unsigned. Signed means only positive values are allowed. unsigned means the whole range of number space can be used. If your address space allows for file sizes up to 4 GB and you use an unsigned int to address it you CAN access that space. Using a signed variable allows you to address only 2 GB. Endian types only state where that sign is stored in the byte: the low bit or the high bit. nothing more nothing less. Some compilers use predominantly big others small endian variable. windows and unix in general use the two different styles. Robert Tito |  Talk 

I don't think I'd put it that way. First of all, the sign bit is a function of how integer values are encoded, not of bytes themselves. Big endian means most significant bit first, and little endian means least significant bit first. We write numbers in big endian form because 24 is 20 + 4, and the most significant digit comes first. Of common architectures, the i386 (including the Pentium etc.) is little endian, and virtually everything else is big endian. Oh, and you might want mention the connection to "Gulliver's Travels".Greg Woodhouse 22:40, 12 April 2007 (CDT)
I think I'm finally starting to understand this concept clearly. I'm going to re-write the endianness section of the article to make it a bit clearer, especially of what the "most significant byte" is. --Joshua David Williams 22:49, 12 April 2007 (CDT)

OK I will try and work in a one-liner on Byte, something like an "also worth mentioning is whether a Byte is big-endian or little-endian" and a link to an Endianness article.. maybe with Big endian and Little endian redirecting to it.

And yea holy crap the Wikipedia article looks more like "Look at me I can write terse technical articles" rather than striving to be reachable to the masses.

To clarify on Rovert's signed versus unsigned example: You would use an "unsigned" variable for a file system, because you're only going to deal with positive numbers. You would use a "signed" (meaning has positive and negative) address space when talking about a number that can be from -2 to positive 2 (for example).

In very very simple terms, "big endian" means you're placing importance on the leftmost numbers first. "Little endian" means you're placing importance on the rightmost numbers first.

For example: Networks generally use big-endian order; the historical reason is that this allowed routing while a telephone number was being composed.

757-421-2233 is big endian, because first comes the area code (Virginia), then 421 is the prefix (Norfolk), and then the last four numbers actually get you to the specific house.

That's the type of explanation we need in the Endianness article in my opinion --Eric M Gearhart 10:43, 7 April 2007 (CDT)

Not totally true but nice as metaphor. Robert Tito |  Talk  11:29, 7 April 2007 (CDT)

bigger better?

Both LaCie and Iomega have single disk-enclosures out with disks of 1 TB below US$500. The density of the data however is that high these disks cannot be used without solid error-correction. Bigger is not always better, at most easier. Robert Tito |  Talk  09:36, 7 April 2007 (CDT)

What else should be added?

I did a bit of research on the topic of endianness and added a section for it. If anything I said is inaccurate, please correct it. Also, what else should we add? --Joshua David Williams 19:12, 12 April 2007 (CDT)

Hexer image

Should Image:Hexer.png be in this article? I'm not sure since it shows the data in hexadecimal format instead of binary. --Joshua David Williams 19:19, 12 April 2007 (CDT)

Well bytes can be represented in Hex or binary (or octal or decimal or...). I'd say that the caption of the image should reflect that "these values represent bytes in Hexadecimal." --Eric M Gearhart 20:02, 12 April 2007 (CDT)

Integers

Is this the place to discuss how signed values are encoded (i.e., one's complement vs. two's complement)? Greg Woodhouse 22:42, 12 April 2007 (CDT)

In my opinion, this whole topic is really a subtopic of computer architecture. And how integers are represented should get its very own article. Consult a good book on computer architecture and you'll likely find at least an entire chapter about integer representation. I don't think we should try to be a text book here. I think, instead, we should help get readers oriented about a topic--really understand the issues and history--but students seeking to learn the ins and outs of integer representations already have many references to choose from.Pat Palmer 20:27, 23 April 2007 (CDT)

kibibyte?

I'd like to hear what other editors have to say, but kibibyte sounds like a neologism that never really gained acceptance. Certainly, I've never heard it used. A Google search did turn up an interesting page though, [1]. Apparently, there actually was a proposal circulated some years ago, but I don't know how far it went. As a general rule, powers of 2 are used for disk storage. For example a typical block size on modern filesystems is 4K, mean 4096 bytes, not 4000. On the other hand, data rates are always expressed in powers of 10. The 10 in 10base-T means 10 megabits per second, and the nominal data rate ofr 100base-T is 100 megabits per second. Greg Woodhouse 23:18, 12 April 2007 (CDT)

Okay, here you go

1541-2002

IEEE Trial-Use Standard for Prefixes for Binary Multiples

Status: Active
Publication Date: 2003
Page(s): 0_1- 4
E-ISBN: 0-7381-3386-8
ISSN: 
ISBN: 0-7381-3385-X
Year: 2003 
Sponsored by: 
   SCC14

OPAC Link: http://ieeexplore.ieee.org/servlet/opac?punumber=8450

Calling this terminology "standard" overstates things, IMO. Greg Woodhouse 23:32, 12 April 2007 (CDT)

See this page as well. --Joshua David Williams 23:34, 12 April 2007 (CDT)

Yes, I saw that, too. IEC might publish a standard, but the IEEE approach is much more, well, realistic. Truth be told, I can't even find the IEC document, so I'm not sure of its status, but I think IEC is just spelling out the meaning of some new words, should you choose to use them. At best, I think this terminology can be called experimental. Greg Woodhouse 23:49, 12 April 2007 (CDT)

So how should we deal with it in this article then? --Joshua David Williams 10:09, 13 April 2007 (CDT)
I've heard it quite a bit (hehe) in the last few months, but never before that. I wouldn't call it a standard now, but it's definitely worth mentioning because the differences are going to be very large. From what i've seen, only the "1337" are using "KiB". Andrew Swinehart 10:22, 13 April 2007 (CDT)
I think it's not being used at all, and furthermore, it is unlikely to be used, so I don't think it's worth spending much article time on. Remember the attempt to introduce the metric system in the U.S.? Simply didn't work. Every student of computer programming learns immediately that kilobyte is 1024; end of story as far as I'm concerned.Pat Palmer 14:42, 29 April 2007 (CDT)
Not true. It's being used quite a bit, with partition editors in particular (such as GParted and Acronis). It's slowly gaining momentum. Quite honestly, I also disagree with your statement about the metric system, but that's beside the point. :-) --Joshua David Williams 22:23, 29 April 2007 (CDT)
This is just your opinion, and we can agree to disagree. I've never seen it, nor found anyone who considered the 1024 Kb to be a problem at all. I'm surprised people are putting energy into something that has worked well for 50 years, when there are so many other things to worry about. Changing terminology always causes problems. I'd suffer the new term to be mentioned in the article as long as we don't oversell it. It's just a proposal that has not yet been widely adopted. Many such proposals die out after a time. We'll have to wait and see.Pat Palmer 07:29, 30 April 2007 (CDT)

Differences

In a recent edit, Phillip Stewart changed the percentage of difference between a yottabyte and a yobibyte from 1.209% to 17.281%. Is this correct, a mistake, or vandalism? I used the formula (2^80)/(10^24) to calculate my number. --Joshua David Williams 00:00, 13 April 2007 (CDT)

I've reverted Phillip's version for the sake of consistency. I believe that he was incorrect. If not, please post a message regarding this. This is important information that we must know - and agree upon - when writing an article. --Joshua David Williams 00:25, 13 April 2007 (CDT)

I don't think so. See for yourself

1 - (pow(10, 24)/ (pow(2, 80)
= 0.17281938745
 
0.17281938745 * 100
= 17.281938745

so 17.2819% is right. Greg Woodhouse 00:29, 13 April 2007 (CDT)

Ah, I see my mistake now. I'll fix the table, but we'll need to check these things very carefully afterwards. --Joshua David Williams 00:32, 13 April 2007 (CDT)
I think we should use the raw numbers and not percents. IMO, they're much easier to understand and calculate. (2^10)-(10^3), (2^20)-(10^6), etc. Thoughts? Andrew Swinehart 10:38, 13 April 2007 (CDT)

I'd stick with percentages, as the point is that the differences can be substantial. (Did you see the footnote about the disk manufacturer that tried to use powers of 10 and the subsequent law suit?) Of course, if there's room, raw numbers might be a good thing to include, too. Greg Woodhouse

But the raw numbers actually show the difference more. KB vs. KiB is 24, MB vs. MiB is 48576, GB is 73741824, and they just keep getting bigger. I say the raw numbers show the difference much better than percents. Or, we could just add another column. Andrew Swinehart 11:06, 13 April 2007 (CDT)

Another column would be great if it fits alright. --Joshua David Williams 11:08, 13 April 2007 (CDT)

I went ahead and added a column, but I couldn't find a calculator that could tell me the exact answer for the last row, so I had to use scientific notation. If any of you can get the exact answer, that'd be great. --Joshua David Williams 11:18, 13 April 2007 (CDT)
I found the exact number here. I checked it, and it's correct. --Joshua David Williams 16:36, 13 April 2007 (CDT)

Nibble and word

Should nibble and word be combined into this article, just as megabyte is? It seems to me that there really isn't much to say about these topics that couldn't be said here briefly. --Joshua David Williams 12:47, 13 April 2007 (CDT)

Finished?

I believe this article is finished. Could an editor please take a final look at it? --Joshua David Williams 17:00, 13 April 2007 (CDT)

1024 vs. 1000 again

I really wish you would revise the section where you discuss units of storage, because the statment you make, that the use kilobytes as a unit of measurement is "non-standard" is factually incorrect. What is a correct statement is that, due to this potentially confusing terminology, IEC has standardized the terms kibibyte, etc. Standardizing the meaning of word B does not mean that use of word A is no longer standard, though using word B to mean something other than what IEC has defined it to mean would be. The obvious caveat here is that if the meaning of word A is also redefined, then the old use can be considered non-standard. I'm sorry to be a stickler here, but it it's important to be precise. By the way, I think your use of the lawsuit to show why it is important to have standard terminology is important. You also might consider citing the IEEE document I mentioned above.

If I were you, I'd say something roughly like this (by all means rephtase and flesh it out as you see fit): Storage is measured in units that are powers of 2, but data rates are measured in units that are powers of 10. This means that in some contexts 1 kB = 1024 kB, but in other contexts, 1 kB = 1000 B. This is potentially confusing (mention the law suit), so IEC has standardized a set of binary prefixes, and IEEE approved them as a trial use standard.

It may be that we're on the cusp of a significant shift in terminology (rather like the new consensus that Pluto isn't a planet), but you're still going against the grain of industry practice here, and you do need to be careful not to use CZ articles for advocacy. Greg Woodhouse 22:41, 13 April 2007 (CDT)

What do you think of this?
===Conflicting definitions===
For more information, see: Binary prefix.

Traditionally, the computer world has used a value of 1024 instead of 1000 when referring to a kilobyte. This was done because programmers needed a number compatible with the base of 2, and 1024 is equal to 2 to the 10th power. Due to the large confusion between these two meanings, an effort has been made by the International Electrotechnical Commission (EIC) to remedy this problem. They have standardized a new system called the 'binary prefix', which replaces the word 'kilobyte' with 'kibibyte', abbreviated as KiB. This solution has since been approved by the IEEE on a trial-use basis, and may prove to one day become a true standard.[1]

While the difference between 1000 and 1024 may seem trivial, one must note that as the size of a disk increases, so does the margin of error. The difference between 1TB and 1TiB, for instance, is approximately 10%. As hard drives become larger, the need for a distinction between these two prefixes will grow. This has been a problem for hard disk drive manufacturers in particular. For example, one well known disk manufacturer, Western Digital, has recently been taken to court for their use of the base of 10 when labeling the capacity of their drives. This is a problem because labeling a hard drive's capacity with the base of 10 implies a greater storage capacity when the consumer may assume it refers to the base of 2. [2]

--Joshua David Williams 21:46, 14 April 2007 (CDT)

I think that's okay. Of course. IEC did standardize the meaning of the two terms, so it's not so much a matter of not being a true standard, as it is saying the terminology now being used is non-standard is, IMO, too strong. It might be worth noting that a big reason that this terminology is confusing iss that storage (e.g., drive capacities) is measured in powers of 2, but in most cases, (e.g., data rates) the convention is to use powers of 10. This means that engineers working in one field use the same word to mean 1024 bytes that engineers in another field would use to mean 1000 bytes. Now that's confusing! Greg Woodhouse 21:58, 14 April 2007 (CDT)

Okay, I added that. I'm not sure whether everything is stated in the correct order or not. I may have my dad look at it, since he's very good at that sort of thing. What's your opinion on this? --Joshua David Williams 22:04, 14 April 2007 (CDT)
I reworded it a bit. See what you think. Greg Woodhouse 22:18, 14 April 2007 (CDT)

I think that's good. Let me know what you think of the changes I made. I wasn't exactly sure what all I should explain in further detail. I remember I defined "bit" briefly in the opening sentence. --Joshua David Williams 23:03, 14 April 2007 (CDT)

No, I think what you have looks pretty good. You might want to note that Hindu-Arabic numerals use big-endian order (as in your example of 1024). Most people don't think of this at first. But right now, I think the best thing to do is get someone else's impressions of the article, maybe someone with a less technical background. Greg Woodhouse 23:19, 14 April 2007 (CDT)

I'm not sure what you mean. Are you talking about the fact that 1024 would be stored as 0142 and not 4201? --Joshua David Williams 08:03, 15 April 2007 (CDT)

I'm not sure whether there's a standard meaning for the word "standard". I think it could mean either what an organization such as IEEE declares to be the standard, or it could mean what is very commonly used.
It's a "standard" if one of the standards bodies endorses it (ISO, ECMA, ANSI etc.). It's a "de facto" standard if it's just what everybody does (kilobyte=1024 is like that). open proposals such as internet RFC's can be ignored and go nowhere, or they can be widely adopted and become "standards".Pat Palmer 08:04, 30 April 2007 (CDT)
I'm going to insert "to mean 1024" after the mention of KiB because I don't find that completely clear otherwise (i.e. whether it means 1024 or 1000) until you get to the table. --Catherine Woodgold 21:57, 28 April 2007 (CDT)

ASCII

OK so ASCII is added, where is EBCDIC? It would be a nice addition cince ASCII is in effect 7 bit and EBCDIC 8. Also the need for Unicode seems eminent. Robert Tito |  Talk  19:56, 23 April 2007 (CDT)

character encoding, charsets, and "plain text"

The notion of plain text no longer holds these days, as it is muddied by the myriad options for character encoding (including various flavors of Unicode), as well as many different character sets. I'm not sure how we should handle it here, but it's much more complicated that this article currently portrays it. I'm not sure we need to write about it in Citizendium, as there are multiple references available on the internet, but it's something to be aware of.Pat Palmer 20:23, 23 April 2007 (CDT)

It may not be something we want to dive into deeply, I think that's definitely something we need to mention. It sounds like you know more about this topic than I do. Perhaps you could give a go at it. --Joshua David Williams 20:41, 23 April 2007 (CDT)

references

We can at least give references to these article and shortly indicate them and their topics Robert Tito |  Talk  20:46, 23 April 2007 (CDT)

endianness

works on byte level, by storing the least significant bit at the lowest or highest end of the byte, so my remark of the BIUT in stead of BYTE was correct. Robert Tito |  Talk 

oh oh, I think you're right. do you have the energy to fix it? I'm done for the moment. Thanks for pointing out!Pat Palmer 21:15, 23 April 2007 (CDT)
no big deal, yeah I will fix it (again) hehehehe rub it in rub it in :) Robert Tito |  Talk  By the way why did you delete +1000 and 1000+? as that is exactly the difference between the two systems.
I think I did too much at one time and became absolutely brain-dead. Better quit tonight while the quitin's good. Till next wild edit. Pat Palmer 21:27, 23 April 2007 (CDT)

lol OK, will you add it tomorrow again? you take WILD WIKI editing to the next level :) Robert Tito |  Talk 

That's what I thought endianness was: the order of the bits inside the byte. I think I even had to write a program once to switch them around. Now the article says it's both, but implies it's mostly the order of the bytes. Does it still need to be changed? --Catherine Woodgold 22:03, 28 April 2007 (CDT)
Hi Catherine--thanks for joining us in here. I don't think there needs to be any change in the article. I think endian-ness is usually per byte, but closely related is the bit ordering inside the bytes. So, for example, the header files of the GNU C compiler always define both byte ordering and bit ordering for each target architecture. That's a good place to look to see how a given machine handles it.Pat Palmer 07:24, 30 April 2007 (CDT)

Revisionism (1024 vs. 1000)

The problem I have here has nothing to do with the technical merits of the proposed terminology, or even whether IEC has standardized the meaning of the new terms, but that I don't think it's appropriate for Citizendium to pursue a revisionist agenda. Going back to the example of the metric system: it would be fine to write about the history of the metric system, its near exclusive use is scientific and technical work, or the fact that nearly every country except the U.S. has adopted the metric system (assuming this is true, and I have no reason to doubt it). What would be inappropriate is to write about how people ought to use the metric system. The distinction is very much like the distinction between descriptive and prescriptive linguistics. A Citizendium article ought(!) to be descriptive. Greg Woodhouse 11:31, 30 April 2007 (CDT)

I don't think the article is "done" yet

My compliments to all who have contributed here; it's a good accumulation of information. Now I think it can be revised into a good reading experience. I'm thinking in terms of somehow trying to write it so it can be understood by a lay person as well as useful to learning geeks. Also, I think it needs some historical overview; I'm almost certain I recall that some computers were built using 9-bit bytes, and if that was the case, then that's very interesting and should be included somewhere. Finally, I still feel that the long discussion about kikibytes does NOT belong here. I wonder if that could be broken out into its own article, or something. I strongly agree with Greg above that we should not be doing any revisionism, and even if a couple of documents in open source are using the "new" terms, it has not by any means gained widespread acceptance. I think that topic is currently overpowering the article as a whole. At any rate, I'm putting this back to status=2 for now. I feel it's not ready to be approved/frozen yet, though much progress has been made.Pat Palmer 02:04, 9 May 2007 (CDT)

OK, now I see that varying byte lengths is mentioned down in the article. That needs to go to the top. It seems inconsistent first to define a byte as definitely being 8 bits (as we do at the top), then later state that, oh by the way, it can really be any length.Pat Palmer 02:09, 9 May 2007 (CDT)

Minor typo

in the endianness section "the particulur ordering used is called" should be "the particular ordering used is called".

Fixed. In the future, you should go ahead and fix those things yourself. You're welcome to edit any page on the site for the better. Thanks for the heads up! :-) BTW, also don't forget to sign your comments using four tildes (~~~~). --Joshua David Williams 13:08, 18 May 2007 (CDT)

Hexadecimal

The inset picture of hexer mentions "hexadecimal" I think this deserves a mention in the bulk of the text as a common way to describe the value of 8 bit bytes (mainly due to compactness of notation?). Hexadecimal itself is mentioned on the page "Binary number system" which we could also consider have a cross reference to.
  1. IEEE Trial-Use Standard for Prefixes for Binary Multiples (Accessed April 14th, 2007).
  2. Nate Mook (2006-06-28). Western Digital Settles Capacity Suit.